Increase the size of a np.array - python

I ran a conv1D on a X matrix of shape (2000, 20, 28) for batch size of 2000, 20 time steps and 28 features.
I would like to move forward to a conv2D CNN and increase the dimensionality of my matrix to (2000, 20, 28, 10) having 10 elements for which I can build a (2000, 20, 28) X matrix. Similarly, I want to get a y array of size (2000, 10) i.e. 5 times the y array of size (2000, ) that I used to get for LSTM and Conv1D networks.
The code I used to create the 20 time-steps from input dataX, dataY, was
def LSTM_create_dataset(dataX, dataY, seq_length, step):
Xs, ys = [], []
for i in range(0, len(dataX) - seq_length, step):
v = dataX.iloc[i:(i + seq_length)].values
Xs.append(v)
ys.append(dataY.iloc[i + seq_length])
return np.array(Xs), np.array(ys)
I use this function within the loop I prepared to create the data of my conv2D NN :
for ric in rics:
dataX, dataY = get_model_data(dbInput, dbList, ric, horiz, drop_rows, triggerUp1, triggerLoss, triggerUp2 = 0)
dataX = get_model_cleanXset(dataX, trigger) # Clean X matrix for insufficient data
Xs, ys = LSTM_create_dataset(dataX, dataY, seq_length, step) # slide over seq_length for a 3D matrix
Xconv.append(Xs)
yconv.append(ys)
Xconv.append(Xs)
yconv.append(ys)
I obtain a (10, 2000, 20, 28) Xconv matrix instead of the (2000, 20, 28, 10) targeted output matrix X and a (10, 2000) matrix y instead of the targeted (2000, 10).
I know that I can easily reshape yconv with yconv = np.reshape(yconv, (2000, 5)). But the reshape function for Xconv Xconv = np.reshape(Xconv, (2000, 20, 28, 10)) seems hazardous as I cannot vizualize output and even erroneous.
How could I do it safely (or could you confirm my first attempt ?
Thanks a lot in advance.

If your matrix for y has shape (10, 2000) then you will not be able to shape it to your desired (2000,5). I've demonstrated this below.
# create array of same shape as your original y
arr_1 = np.arange(0,2000*10).reshape(10,2000)
print(arr_1.shape) # returns (10,2000)
arr_1 = arr_1.reshape(2000,5)
This returns the following error message as it is critical that the dimensions of the before and after shapes must match.
ValueError: cannot reshape array of size 20000 into shape (2000,5)
I do not fully understand the statement that you cannot visualise the output - you could manually check that the reshape function had done so correctly if you wished, for your dataset (or a small part of it to confirm that the function is working effectively) using print statements, as below - by comparing the output to your original data and what you expect the data to look like afterwards.
import numpy as np
arr = np.arange(0,2000)
arr = arr.reshape(20,10,10,1) # reshape array to shape (20, 10, 10, 1)
# these statements let you examine the array contents at varying depths
print(arr[0][0][0])
print(arr[0][0])

Related

Why isn't Tensorflow/Keras Flatten layer flattening my array?

I am trying to use the tensorflow.keras.layers.Flatten layer outside of a model to flatten a 4x4 tensor. I can't figure out why the Flatten layer isn't actually flattening my array.
Here is my code:
import tensorflow as tf
import numpy as np
flayer = tf.keras.layers.Flatten()
X = tf.constant(np.random.random((4,4)),dtype=tf.float32)
Xf = flatten_layer(X)
print(Xf)
and print(Xf) shows
tf.Tensor(
[[0.9866459 0.52488756 0.86211777 0.06254051]
[0.32552275 0.23201537 0.8646714 0.80754006]
[0.55823076 0.51929855 0.538077 0.4111973 ]
[0.95845264 0.14468837 0.30223057 0.09648433]], shape=(4, 4), dtype=float32)
Why doesn't my flatten layer output a 16x1 tensor?
That's because the Flatten() layer assumes that the first dimension is the number of samples, so it returns 4 flattened rows. You have 4 observations, and 1D input for each of these already. It would behave differently if you had data with shape (32, 28, 28, 1), for example, which has a higher dimensionality for each row.
import tensorflow as tf
import numpy as np
flayer = tf.keras.layers.Flatten()
X = tf.constant(np.random.random((32, 28, 28, 1)),dtype=tf.float32)
Xf = flayer(X)
print(Xf.shape)
(32, 784)
If you meant to flatten one observation with shape (4, 4), you should add a batch dimension for it to work:
X = tf.constant(np.random.random((1, 4, 4)),dtype=tf.float32)
Xf = flayer(X)
print(Xf.shape)
(1, 16)

Problems passing tensor to linear layer - Pytorch

I'm trying to build a neural net however I can't figure out where I'm going wrong with the max pooling layer.
self.embed1 = nn.Embedding(256, 8)
self.conv_1 = nn.Conv2d(1, 64, (7,8), padding = (0,0))
self.fc1 = nn.Linear(64, 2)
def forward(self,x):
import pdb; pdb.set_trace()
x = self.embed1(x) #input a tensor of ([1,217]) output size: ([1, 217, 8])
x = x.unsqueeze(0) #conv lay needs a tensor of size (B x C x W x H) so unsqueeze here to make ([1, 1, 217, 8])
x = self.conv_1(x) #creates 64 filter of size (7, 8).Outputs ([1, 64, 211, 1]) as 6 values lost due to not padding.
x = torch.max(x,0) #returning max over the 64 columns. This returns a tuple of length 2 with 64 values in each att, the max val and indices.
x = x[0] #I only need the max values. This returns a tensor of size ([64, 211, 1])
x = x.squeeze(2) #linear layer only wants the number of inputs and number of outputs so I squeeze the tensor to ([64, 211])
x = self.fc1(x) #Error Size mismatch (M1: [64 x 211] M2: [64 x 2])
I understand why the linear layer isn't accepting 211 however I don't understand why my tensor after maxing over the columns isn't 64 x 2.
You use of torch.max returns two outputs: the max value along dim=0 and the argmax along that dimension. Thus, you need to pick only the first output. (you might want to consider using adaptive max pooling for this task).
Your linear layer expects its input to have dim 64 (that is batch_size-by-64 shaped tensor). However, it seems like your x[0] is of shape 13504x1 - definitely not 64.
See this thread for example.
If I'm guessing your intentions correctly, your mistake is that you're using torch.max for 2d maxpooling, instead of torch.nn.functional.max_pool2d. The former reduces across a tensor dimension (for instance across all feature maps or all horizontal lines), whereas the latter reduces in each square spatial neighborhood in the [h, w] plane of a [batch, features, h, w] tensor.
Instead of this:
x = x.squeeze(2)
You can do this instead:
x = x.view(-1, 64) # view will now correctly resize it to [64 x 2]
You can think of view as numpy reshape. We use -1 to signify that we don't know how many rows we want but we know how many columns we have, 64.

Concatenating two tensors with different dimensions in Pytorch

Is it possible to concatenate two tensors with different dimensions without using for loop.
e.g. Tensor 1 has dimensions (15, 200, 2048) and Tensor 2 has dimensions (1, 200, 2048). Is it possible to concatenate 2nd tensor with 1st tensor along all the 15 indices of 1st dimension in 1st Tensor (Broadcast 2nd tensor along 1st dimension of Tensor 1 while concatenating along 3rd dimension of 1st tensor)? The resulting tensor should have dimensions (15, 200, 4096).
Is it possible to accomplish this without for loop ?
You could do the broadcasting manually (using Tensor.expand()) before the concatenation (using torch.cat()):
import torch
a = torch.randn(15, 200, 2048)
b = torch.randn(1, 200, 2048)
repeat_vals = [a.shape[0] // b.shape[0]] + [-1] * (len(b.shape) - 1)
# or directly repeat_vals = (15, -1, -1) or (15, 200, 2048) if shapes are known and fixed...
res = torch.cat((a, b.expand(*repeat_vals)), dim=-1)
print(res.shape)
# torch.Size([15, 200, 4096])

How to use tf.data.Dataset.apply() for reshaping the dataset

I am working with time series models in tensorflow. My dataset contains physics signals. I need to divide this signals into windows as give this sliced windows as input to my model.
Here is how I am reading the data and slicing it:
import tensorflow as tf
import numpy as np
def _ds_slicer(data):
win_len = 768
return {"mix":(tf.stack(tf.split(data["mix"],win_len))),
"pure":(tf.stack(tf.split(data["pure"],win_len)))}
dataset = tf.data.Dataset.from_tensor_slices({
"mix" : np.random.uniform(0,1,[1000,24576]),
"pure" : np.random.uniform(0,1,[1000,24576])
})
dataset = dataset.map(_ds_slicer)
print dataset.output_shapes
# {'mix': TensorShape([Dimension(768), Dimension(32)]), 'pure': TensorShape([Dimension(768), Dimension(32)])}
I want to reshape this dataset to # {'mix': TensorShape([Dimension(32)]), 'pure': TensorShape([Dimension(32))}
Equivalent transformation in numpy would be something like following:
signal = np.random.uniform(0,1,[1000,24576])
sliced_sig = np.stack(np.split(signal,768,axis=1),axis=1)
print sliced_sig.shape #(1000, 768, 32)
sliced_sig=sliced_sig.reshape(-1, sliced_sig.shape[-1])
print sliced_sig.shape #(768000, 32)
I thought of using tf.contrib.data.group_by_window as an input to dataset.apply() but couldn't figure out exactly how to use it. Is there a way I can use any custom transformation to reshape the dataset?
I think you're just looking for the transformation tf.contrib.data.unbatch. This does exactly what you want:
x = np.zeros((1000, 768, 32))
dataset = tf.data.Dataset.from_tensor_slices(x)
print(dataset.output_shapes) # (768, 32)
dataset = dataset.apply(tf.contrib.data.unbatch())
print(dataset.output_shapes) # (32,)
From the documentation:
If elements of the dataset are shaped [B, a0, a1, ...], where B may vary from element to element, then for each element in the dataset, the unbatched dataset will contain B consecutive elements of shape [a0, a1, ...].
Edit for TF 2.0
(Thanks #DavidParks)
From TF 2.0, you can use directly tf.data.Dataset.unbatch:
x = np.zeros((1000, 768, 32))
dataset = tf.data.Dataset.from_tensor_slices(x)
print(dataset.output_shapes) # (768, 32)
dataset = dataset.unbatch()
print(dataset.output_shapes) # (32,)

Simple network for arbitrary shape input

I am trying to create an autoencoder in Keras with Tensorflow backend. I followed this tutorial in order to make my own. Input to the network is kind of arbitrary i.e. each sample is a 2d array with fixed number of columns (12 in this case) but rows range between 4 and 24.
What I have tried so far is:
# Generating random data
myTraces = []
for i in range(100):
num_events = random.randint(4, 24)
traceTmp = np.random.randint(2, size=(num_events, 12))
myTraces.append(traceTmp)
myTraces = np.array(myTraces) # (read Note down below)
and here is my sample model
input = Input(shape=(None, 12))
x = Conv1D(64, 3, padding='same', activation='relu')(input)
x = MaxPool1D(strides=2, pool_size=2)(x)
x = Conv1D(128, 3, padding='same', activation='relu')(x)
x = UpSampling1D(2)(x)
x = Conv1D(64, 3, padding='same', activation='relu')(x)
x = Conv1D(12, 1, padding='same', activation='relu')(x)
model = Model(input, x)
model.compile(optimizer='adadelta', loss='binary_crossentropy')
model.fit(myTraces, myTraces, epochs=50, batch_size=10, shuffle=True, validation_data=(myTraces, myTraces))
NOTE: As per Keras Doc, it says that input should be a numpy array, if I do so I get following error:
ValueError: Error when checking input: expected input_1 to have 3 dimensions, but got array with shape (100, 1)
And if I dont convert it in to numpy array and let it be a list of numpy arrays I get following error:
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 100 arrays: [array([[0, 1, 0, 0 ...
I don't know what I am doing wrong here. Also I am kind of new to Keras. I would really appreciate any help regarding this.
Numpy does not know how to handle a list of arrays with varying row sizes (see this answer). When you call np.array with traceTmp, it will return a list of arrays, not a 3D array (An array with shape (100, 1) means a list of 100 arrays).
Keras will need a homogeneous array as well, meaning all input arrays should have the same shape.
What you can do is pad the arrays with zeroes such that they all have the shape (24,12): then np.array can return a 3-dimensional array and the keras input layer does not complain.

Categories

Resources