Related
I'm writing a definition that needs to take slices of a tensor with an arbitrary number of dimensions. The slice will always be on the batch dimension of 0.
Here is a simple example of what I want:
def masktensor(X, array_of_indices):
return X[array_of_indices, *] # edit * to allow variable number of dims
I want to be able to feed it various sized tensors and get the indexed slices out as a batch. Such as:
A = torch.rand(1000, 3, 32, 32)
B = torch.rand(1000, 5, 20)
indices = np.arange(10)
A_batch = masktensor(A, indices)
B_batch = masktensor(B, indices)
Thanks in advance!
After a bit of trial and error, I found that ... works as a slice wildcard:
def masktensor(X, array_of_indices):
return X[array_of_indices, ...] # <---- '...' allows variable dim size
A = torch.rand(1000, 3, 32, 32)
B = torch.rand(1000, 5, 20)
indices = np.arange(10)
print(masktensor(A, indices).size())
print(masktensor(B, indices).size())
Yields:
>>> torch.Size([10, 3, 32, 32])
>>> torch.Size([10, 5, 20])
My program creates a numpy array within a for loop. For example it creates array with shape (100*30*10), then (160*30*10) and then may be (120*30*10) . I have to append the above to an empty numpy array such that , at the end of the loop, it will be a numpy array with shape (380*30*10) (i.e sum of 100+160+120) . The second and third dimension doesnt change in the numpy array.
How can I do the above in python. I tried the following.
np_model = np.append(np_model,np_temp1)
print("Appended model shape is",np_model.shape)
np_label = np.append(np_label,np_temp2)
print("Appended label shape is",np_label.shape)
The np_model is an empty array which I have defined as np_model = np.empty(1,30,10) and np_label as np_label = np.empty(1 ,str)
np_temp1 corresponds to array within each for loop like 100*30*10,120*30*10 etc and np_temp2 is a string with "item1","item2" etc
The np_label is a string numpy array with 1 label corresponding to np_temp1.shape[0]. But the result I get in np_model is flattened array with size 380*30*10 = 1140000
Any help is appreciated.
you can use numpy concatenate function, append the output numpy(s) to a list and then feed it to the concatenate function:
empty_list = []
x = np.zeros([10, 20, 4])
y = np.zeros([12, 20, 4])
empty_list.append(x)
empty_list.append(y)
z = np.concatenate(empty_list, axis=0)
print(x.shape, y.shape, z.shape)
(10, 20, 4) (12, 20, 4) (22, 20, 4)
As #Nullman suggested in comment(np.vstack)
You can create empty array like this >>> np_model = np.empty((0,30,10))
>>> np_model = np.empty((0,30,10))
>>> a = np.random.rand(100,30,10)
>>> b = np.random.rand(160,30,10)
>>> c = np.random.rand(120,30,10)
# It can done by one-line like`np_model = np.vstack((a,b,c))`
# but i guess you have loop dependency here
>>> np_model = np.vstack((np_model,a))
>>> np_model = np.vstack((np_model,b))
>>> np_model = np.vstack((np_model,c))
>>> np_model.shape
(380, 30, 10)
To specifically answer your question towards starting with an empty array, that'd be my solution, solely using np.concatenate:
import numpy as np
# Some arrays to append in a loop
arrays = (
np.random.rand(100, 30, 10),
np.random.rand(160, 30, 10),
np.random.rand(120, 30, 10)
)
# Initial empty array
array = np.zeros((0, 30, 10))
# Appending arrays in loop
for a in arrays:
array = np.concatenate((array, a), axis=0)
# Output shape
print(array.shape)
Output:
(380, 30, 10)
Hope that helps!
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.8.1
NumPy: 1.18.1
----------------------------------------
I have a dataset where one of the similar looking class is imbalanced. It is a number dataset where class labels go from 1 to 10.
Grouping by label (y) on the training set gives the following output:
(array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=uint8), array([13861, 10585, 8497, 7458, 6882, 5727, 5595, 5045, 4659,
4948]))
As could be seen 1 has 13861 data-points and 7 has only 5595 data-points.
To avoid the class imbalance between 1 and 7 I want to put some extra images for 7 class.
Here is train set:
from scipy.io import loadmat
train = loadmat('train.mat')
extra = loadmat('extra.mat')
Both train and extra are dictionaries with 2 keys X and y each.
Here is the shape of train and extra:
train['X'] --> (32, 32, 3, 73257)
# 73257 images of 32x32x3
train['y'] --> (73257,1)
# 73257 labels of corresponding images
extra['X'] --> (32, 32, 3, 531131)
# 531131 images of 32x32x3
extra['y'] --> (531131, 1)
# 531131 labels of corresponding images
Now, I want to update train dataset with labels from extra, primarily taking x% of data with label 7 in extra into train. How could I do this?
I tried the following:
arr, _ = np.where(extra['y'] == 7)
c = np.concatenate(X_train, extra['X'][arr])
But I get an error saying IndexError: index 32 is out of bounds for axis 0 with size 32
Here is a working example on just numpy arrays that easily translates to your case. As you have edited, use numpy.where to find the labels you want on extra['y'] and keep these indices. These are then used together with numpy.append to concatenate (last axis for X and first axis for y) your original dataset with the extra one.
import numpy as np
np.random.seed(100)
# First find the indices of your y_extra with label 7
x_extra = np.random.rand(32, 32, 3, 10)
y_extra = np.random.randint(0, 9, size=(10,1))
indices = np.where(y_extra==7)[0] # indices [3,4] are 7 with seed=100
# Now use this indices to concatenate them in the original datase
np.random.seed(101)
x_original = np.random.rand(32, 32, 3, 10)
y_original = np.random.randint(1, 10, size=(10,1))
print(x_original.shape, x_extra[..., indices].shape) # (32, 32, 3, 10) (32, 32, 3, 2)
print(y_original.shape, y_extra[indices].shape) # (10, 1) (2, 1)
x_final = np.append(x_original, x_extra[..., indices], axis=-1)
y_final = np.append(y_original, y_extra[indices], axis=0)
print(x_final.shape, y_final.shape) # (32, 32, 3, 12) (12, 1)
I have a tensor of shape (16, 4096, 3). I have another tensor of indices of shape (16, 32768, 3). I am trying to collect the values along dim=1. This was initially done in pytorch using gather function as shown below-
# a.shape (16L, 4096L, 3L)
# idx.shape (16L, 32768L, 3L)
b = a.gather(1, idx)
# b.shape (16L, 32768L, 3L)
Please note that the size of output b is the same as that of idx. However, when I apply gather function of tensorflow, I get a completely different output. The output dimension was found mismatching as shown below-
b = tf.gather(a, idx, axis=1)
# b.shape (16, 16, 32768, 3, 3)
I also tried using tf.gather_nd but got in vain. See below-
b = tf.gather_nd(a, idx)
# b.shape (16, 32768)
Why am I getting different shapes of tensors? I want to get the tensor of the same shape as calculated by pytorch.
In other words, I want to know the tensorflow equivalent of torch.gather.
For 2D case,there is a method to do it:
# a.shape (16L, 10L)
# idx.shape (16L,1)
idx = tf.stack([tf.range(tf.shape(idx)[0]),idx[:,0]],axis=-1)
b = tf.gather_nd(a,idx)
However,For ND case,this method maybe very complex
This "should" be a general solution using tf.gather_nd (I've only tested for rank 2 and 3 tensors along the last axis):
def torch_gather(x, indices, gather_axis):
# if pytorch gather indices are
# [[[0, 10, 20], [0, 10, 20], [0, 10, 20]],
# [[0, 10, 20], [0, 10, 20], [0, 10, 20]]]
# tf nd_gather needs to be
# [[0,0,0], [0,0,10], [0,0,20], [0,1,0], [0,1,10], [0,1,20], [0,2,0], [0,2,10], [0,2,20],
# [1,0,0], [1,0,10], [1,0,20], [1,1,0], [1,1,10], [1,1,20], [1,2,0], [1,2,10], [1,2,20]]
# create a tensor containing indices of each element
all_indices = tf.where(tf.fill(indices.shape, True))
gather_locations = tf.reshape(indices, [indices.shape.num_elements()])
# splice in our pytorch style index at the correct axis
gather_indices = []
for axis in range(len(indices.shape)):
if axis == gather_axis:
gather_indices.append(gather_locations)
else:
gather_indices.append(all_indices[:, axis])
gather_indices = tf.stack(gather_indices, axis=-1)
gathered = tf.gather_nd(x, gather_indices)
reshaped = tf.reshape(gathered, indices.shape)
return reshaped
For the last-axis gathering, we can use the 2D-reshape trick for general ND cases, and then employ #LiShaoyuan 2D code above
# last-axis gathering only - use 2D-reshape-trick for Torch's style nD gathering
def torch_gather(param, id_tensor):
# 2d-gather torch equivalent from #LiShaoyuan above
def gather2d(target, id_tensor):
idx = tf.stack([tf.range(tf.shape(id_tensor)[0]),id_tensor[:,0]],axis=-1)
result = tf.gather_nd(target,idx)
return tf.expand_dims(result,axis=-1)
target = tf.reshape(param, (-1, param.shape[-1])) # reshape 2D
target_shape = id_tensor.shape
id_tensor = tf.reshape(id_tensor, (-1, 1)) # also 2D-index
result = gather2d(target, id_tensor)
return tf.reshape(result, target_shape)
I have a numpy array of size image_stack 64x28x28x3 which correspond to 64 images of size 28x28x3. What I want is to construct an image of size 224x224x3 which will contain all my images that are in the initial array. How can I do so in numpy? So far I have the code for stacking the images in the same line, however I want 8 lines of 8 columns instead. My code so far:
def tile_images(image_stack):
"""Given a stacked tensor of images, reshapes them into a horizontal tiling for display."""
assert len(image_stack.shape) == 4
image_list = [image_stack[i, :, :, :] for i in range(image_stack.shape[0])]
tiled_images = np.concatenate(image_list, axis=1)
return tiled_images
Does the following reshape, transpose, reshape trick work?
x.shape # (64, 28, 28, 3)
mosaic = x.reshape(8, 8, 28, 28, 3).transpose((0, 2, 1, 3, 4)).reshape(224, 224, 3)
The first reshape breaks your 64 into lines and columns. Transpose rearranges their order so that we can collapse them in a meaningful way.
Your function would then look like:
def tile_images(x):
dims = x.shape
assert len(dims) == 4
stack_dim = int(np.sqrt(dims[0]))
res = x.reshape(stack_dim, stack_dim, *dims[1:]).transpose((0, 2, 1, 3, 4))
tile_size = res.shape[0] * res.shape[1]
return res.reshape(tile_size, tile_size, -1)