Binning data based on each row in numpy array - python

a = train_images.reshape(60000, 28, 28)
print(np.shape(a))
a_view = a.reshape(60000, 7, 4, 7, 4)
print(np.shape(a_view))
binned_out = []
for x in range(len(a)) :
bin_v = a_view[x].mean(axis=3).mean(axis=1)
binned_out.append(bin_v)
In the above code, is there any other way to change the for loop to decrease the execution time and getting the binned data for all the elements in numpy array.
My train_images shape is (60000, 784)

Related

Dynamically Slice Tensor of Arbitrary Dims

I'm writing a definition that needs to take slices of a tensor with an arbitrary number of dimensions. The slice will always be on the batch dimension of 0.
Here is a simple example of what I want:
def masktensor(X, array_of_indices):
return X[array_of_indices, *] # edit * to allow variable number of dims
I want to be able to feed it various sized tensors and get the indexed slices out as a batch. Such as:
A = torch.rand(1000, 3, 32, 32)
B = torch.rand(1000, 5, 20)
indices = np.arange(10)
A_batch = masktensor(A, indices)
B_batch = masktensor(B, indices)
Thanks in advance!
After a bit of trial and error, I found that ... works as a slice wildcard:
def masktensor(X, array_of_indices):
return X[array_of_indices, ...] # <---- '...' allows variable dim size
A = torch.rand(1000, 3, 32, 32)
B = torch.rand(1000, 5, 20)
indices = np.arange(10)
print(masktensor(A, indices).size())
print(masktensor(B, indices).size())
Yields:
>>> torch.Size([10, 3, 32, 32])
>>> torch.Size([10, 5, 20])

Append a numpy array with different first dimensions

My program creates a numpy array within a for loop. For example it creates array with shape (100*30*10), then (160*30*10) and then may be (120*30*10) . I have to append the above to an empty numpy array such that , at the end of the loop, it will be a numpy array with shape (380*30*10) (i.e sum of 100+160+120) . The second and third dimension doesnt change in the numpy array.
How can I do the above in python. I tried the following.
np_model = np.append(np_model,np_temp1)
print("Appended model shape is",np_model.shape)
np_label = np.append(np_label,np_temp2)
print("Appended label shape is",np_label.shape)
The np_model is an empty array which I have defined as np_model = np.empty(1,30,10) and np_label as np_label = np.empty(1 ,str)
np_temp1 corresponds to array within each for loop like 100*30*10,120*30*10 etc and np_temp2 is a string with "item1","item2" etc
The np_label is a string numpy array with 1 label corresponding to np_temp1.shape[0]. But the result I get in np_model is flattened array with size 380*30*10 = 1140000
Any help is appreciated.
you can use numpy concatenate function, append the output numpy(s) to a list and then feed it to the concatenate function:
empty_list = []
x = np.zeros([10, 20, 4])
y = np.zeros([12, 20, 4])
empty_list.append(x)
empty_list.append(y)
z = np.concatenate(empty_list, axis=0)
print(x.shape, y.shape, z.shape)
(10, 20, 4) (12, 20, 4) (22, 20, 4)
As #Nullman suggested in comment(np.vstack)
You can create empty array like this >>> np_model = np.empty((0,30,10))
>>> np_model = np.empty((0,30,10))
>>> a = np.random.rand(100,30,10)
>>> b = np.random.rand(160,30,10)
>>> c = np.random.rand(120,30,10)
# It can done by one-line like`np_model = np.vstack((a,b,c))`
# but i guess you have loop dependency here
>>> np_model = np.vstack((np_model,a))
>>> np_model = np.vstack((np_model,b))
>>> np_model = np.vstack((np_model,c))
>>> np_model.shape
(380, 30, 10)
To specifically answer your question towards starting with an empty array, that'd be my solution, solely using np.concatenate:
import numpy as np
# Some arrays to append in a loop
arrays = (
np.random.rand(100, 30, 10),
np.random.rand(160, 30, 10),
np.random.rand(120, 30, 10)
)
# Initial empty array
array = np.zeros((0, 30, 10))
# Appending arrays in loop
for a in arrays:
array = np.concatenate((array, a), axis=0)
# Output shape
print(array.shape)
Output:
(380, 30, 10)
Hope that helps!
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.8.1
NumPy: 1.18.1
----------------------------------------

Copying values from a numpy array to balance the dataset

I have a dataset where one of the similar looking class is imbalanced. It is a number dataset where class labels go from 1 to 10.
Grouping by label (y) on the training set gives the following output:
(array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=uint8), array([13861, 10585, 8497, 7458, 6882, 5727, 5595, 5045, 4659,
4948]))
As could be seen 1 has 13861 data-points and 7 has only 5595 data-points.
To avoid the class imbalance between 1 and 7 I want to put some extra images for 7 class.
Here is train set:
from scipy.io import loadmat
train = loadmat('train.mat')
extra = loadmat('extra.mat')
Both train and extra are dictionaries with 2 keys X and y each.
Here is the shape of train and extra:
train['X'] --> (32, 32, 3, 73257)
# 73257 images of 32x32x3
train['y'] --> (73257,1)
# 73257 labels of corresponding images
extra['X'] --> (32, 32, 3, 531131)
# 531131 images of 32x32x3
extra['y'] --> (531131, 1)
# 531131 labels of corresponding images
Now, I want to update train dataset with labels from extra, primarily taking x% of data with label 7 in extra into train. How could I do this?
I tried the following:
arr, _ = np.where(extra['y'] == 7)
c = np.concatenate(X_train, extra['X'][arr])
But I get an error saying IndexError: index 32 is out of bounds for axis 0 with size 32
Here is a working example on just numpy arrays that easily translates to your case. As you have edited, use numpy.where to find the labels you want on extra['y'] and keep these indices. These are then used together with numpy.append to concatenate (last axis for X and first axis for y) your original dataset with the extra one.
import numpy as np
np.random.seed(100)
# First find the indices of your y_extra with label 7
x_extra = np.random.rand(32, 32, 3, 10)
y_extra = np.random.randint(0, 9, size=(10,1))
indices = np.where(y_extra==7)[0] # indices [3,4] are 7 with seed=100
# Now use this indices to concatenate them in the original datase
np.random.seed(101)
x_original = np.random.rand(32, 32, 3, 10)
y_original = np.random.randint(1, 10, size=(10,1))
print(x_original.shape, x_extra[..., indices].shape) # (32, 32, 3, 10) (32, 32, 3, 2)
print(y_original.shape, y_extra[indices].shape) # (10, 1) (2, 1)
x_final = np.append(x_original, x_extra[..., indices], axis=-1)
y_final = np.append(y_original, y_extra[indices], axis=0)
print(x_final.shape, y_final.shape) # (32, 32, 3, 12) (12, 1)

tensorflow equivalent of torch.gather

I have a tensor of shape (16, 4096, 3). I have another tensor of indices of shape (16, 32768, 3). I am trying to collect the values along dim=1. This was initially done in pytorch using gather function as shown below-
# a.shape (16L, 4096L, 3L)
# idx.shape (16L, 32768L, 3L)
b = a.gather(1, idx)
# b.shape (16L, 32768L, 3L)
Please note that the size of output b is the same as that of idx. However, when I apply gather function of tensorflow, I get a completely different output. The output dimension was found mismatching as shown below-
b = tf.gather(a, idx, axis=1)
# b.shape (16, 16, 32768, 3, 3)
I also tried using tf.gather_nd but got in vain. See below-
b = tf.gather_nd(a, idx)
# b.shape (16, 32768)
Why am I getting different shapes of tensors? I want to get the tensor of the same shape as calculated by pytorch.
In other words, I want to know the tensorflow equivalent of torch.gather.
For 2D case,there is a method to do it:
# a.shape (16L, 10L)
# idx.shape (16L,1)
idx = tf.stack([tf.range(tf.shape(idx)[0]),idx[:,0]],axis=-1)
b = tf.gather_nd(a,idx)
However,For ND case,this method maybe very complex
This "should" be a general solution using tf.gather_nd (I've only tested for rank 2 and 3 tensors along the last axis):
def torch_gather(x, indices, gather_axis):
# if pytorch gather indices are
# [[[0, 10, 20], [0, 10, 20], [0, 10, 20]],
# [[0, 10, 20], [0, 10, 20], [0, 10, 20]]]
# tf nd_gather needs to be
# [[0,0,0], [0,0,10], [0,0,20], [0,1,0], [0,1,10], [0,1,20], [0,2,0], [0,2,10], [0,2,20],
# [1,0,0], [1,0,10], [1,0,20], [1,1,0], [1,1,10], [1,1,20], [1,2,0], [1,2,10], [1,2,20]]
# create a tensor containing indices of each element
all_indices = tf.where(tf.fill(indices.shape, True))
gather_locations = tf.reshape(indices, [indices.shape.num_elements()])
# splice in our pytorch style index at the correct axis
gather_indices = []
for axis in range(len(indices.shape)):
if axis == gather_axis:
gather_indices.append(gather_locations)
else:
gather_indices.append(all_indices[:, axis])
gather_indices = tf.stack(gather_indices, axis=-1)
gathered = tf.gather_nd(x, gather_indices)
reshaped = tf.reshape(gathered, indices.shape)
return reshaped
For the last-axis gathering, we can use the 2D-reshape trick for general ND cases, and then employ #LiShaoyuan 2D code above
# last-axis gathering only - use 2D-reshape-trick for Torch's style nD gathering
def torch_gather(param, id_tensor):
# 2d-gather torch equivalent from #LiShaoyuan above
def gather2d(target, id_tensor):
idx = tf.stack([tf.range(tf.shape(id_tensor)[0]),id_tensor[:,0]],axis=-1)
result = tf.gather_nd(target,idx)
return tf.expand_dims(result,axis=-1)
target = tf.reshape(param, (-1, param.shape[-1])) # reshape 2D
target_shape = id_tensor.shape
id_tensor = tf.reshape(id_tensor, (-1, 1)) # also 2D-index
result = gather2d(target, id_tensor)
return tf.reshape(result, target_shape)

Create a map of images in python numpy

I have a numpy array of size image_stack 64x28x28x3 which correspond to 64 images of size 28x28x3. What I want is to construct an image of size 224x224x3 which will contain all my images that are in the initial array. How can I do so in numpy? So far I have the code for stacking the images in the same line, however I want 8 lines of 8 columns instead. My code so far:
def tile_images(image_stack):
"""Given a stacked tensor of images, reshapes them into a horizontal tiling for display."""
assert len(image_stack.shape) == 4
image_list = [image_stack[i, :, :, :] for i in range(image_stack.shape[0])]
tiled_images = np.concatenate(image_list, axis=1)
return tiled_images
Does the following reshape, transpose, reshape trick work?
x.shape # (64, 28, 28, 3)
mosaic = x.reshape(8, 8, 28, 28, 3).transpose((0, 2, 1, 3, 4)).reshape(224, 224, 3)
The first reshape breaks your 64 into lines and columns. Transpose rearranges their order so that we can collapse them in a meaningful way.
Your function would then look like:
def tile_images(x):
dims = x.shape
assert len(dims) == 4
stack_dim = int(np.sqrt(dims[0]))
res = x.reshape(stack_dim, stack_dim, *dims[1:]).transpose((0, 2, 1, 3, 4))
tile_size = res.shape[0] * res.shape[1]
return res.reshape(tile_size, tile_size, -1)

Categories

Resources