Reshaping numpy array of images results in extra dimension - python

I have 440 images with the same size 924 x 640 and three channels. I load them via
image_data = []
for filename in iglob(os.path.join(store, '*.jpg')):
image_data.append(plt.imread(filename))
Then I make a numpy ndarray from this list:
image_np_orig = np.array(image_data)
This array has a shape (440,) and it consists of elements with shape of (924, 640, 3). I want to make some t-SNE transformations on this array of images, so I want to reshape the array to make it's shape look like (440, 1):
image_np = image_np_orig.reshape(image_np_orig.shape[0], -1)
Expectation / Reality
I expect to see an array image_np of shape (440, 1) where each element of the first dimension (axis=0) is an array of shape (924, 640, 3). However I get an array image_np of shape (440, 1), where each element of the first dimension is an array of shape (1,) and in these arrays each element of their respective first dimensions is of shape (924, 640, 3).
What I've tried
I've tried
image_np = image_np_orig[:, np.newaxis]
with the same results.
I`ve also tried
image_np = np.stack(image_np_orig)
which lead to image_np with the shape of (440, 924, 640, 3) and then I got the mistake during the t-SNE transform:
from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, init='pca')
X_tsne = tsne.fit_transform(image_np)
returns ValueError: Found array with dim 4. Estimator expected <=2.
Probably relevant
It may be relevant that image_np_orig has dtype object and image_np_orig[0] has dtype uint8. If this is relevant then how can I reshape arrays of different types?

From what I understand, you have an array of shape (440, 1, 924, 640, 3), but you actually need (440, 924, 640, 3)
Try:
image_np = image_np_orig.squeeze()
This will squeeze out the unnwanted dimension.

I'm not sure why the first approach doesn't work for you. But since image_np = np.stack(image_np_orig) returns the 4D data, you can go from there:
image_np = np.stack(image_np_orig).reshape(len(image_np_orig), -1)

Related

Adding an additional channel to a 3 channel tensor

I would like to combine a tensor of shape [3,1024,1024] and a tensor of shape [1,1024,1024] in order to form a single tensor of shape [4,1024,1024]
This is to combine the channels of an RGB image to a depth image in the format of [r,g,b,d] for each pixel
I am currently trying to do this like this:
tensor = tf.concat([imageTensor, depthTensor], axis=2)
But I receive the error
InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [3,1024,1024] vs. shape[1] = [1,1024,1024] [Op:ConcatV2]
I was just wondering how this would be done?
You want to concatenate on axis=0:
import tensorflow as tf
t1 = tf.random.uniform((3, 1024, 1024))
t2 = tf.random.uniform((1, 1024, 1024))
final_tensor = tf.concat((t1, t2), axis=0)
print(final_tensor.shape)
(4, 1024, 1024)

Keras error with captured frames when calling predict with frames from game

I am using SerpentAI library to capture a game frame, build a frame stack and feed it to Keras library for predict function.
When doing this, value error occurs
Here's me creating a frame stack:
full_game_frame = FrameGrabber.get_frames(
[0],
frame_shape=(960, 600),
frame_type="PIPELINE"
).frames[0]
self.dqn_direction.build_frame_stack(full_game_frame.frame)
Build frame stack function:
def build_frame_stack(self, game_frame):
frame_stack = np.stack((
game_frame,
game_frame,
game_frame,
game_frame
), axis=2)
self.frame_stack = frame_stack.reshape((1,) + frame_stack.shape)
The error says:
File "a:\anaconda\envs\serpent2\lib\site-packages\keras\engine\training.py", line 1695, in predict
check_batch_axis=False)
File "a:\anaconda\envs\serpent2\lib\site-packages\keras\engine\training.py", line 132, in _standardize_input_data
str(array.shape))
ValueError: Error when checking : expected input_2 to have 4 dimensions, but got array with shape (1, 600, 960, 4, 3)
I assumed cutting off one array dimention from stack will help, but it raised another error
ValueError: Error when checking: expected input_2 to have shape (None, 960, 600, 4) but got array with shape (1, 600, 960, 4)
Any ideas how to fix this?
Also, getting the frame like this doesn't work:
# full_game_frame = game_frame
You still have a problem in your dimensions. As the Error says the expected input is of shape (None, 960, 600, 4) whereas you try to pass an array with shape (1, 600, 960, 4).
Switching dimensions 1 and 2 (basically just a rotation of the image) should remove the error.
Additionally I don't see the necessity of stacking the image 4 times. Your first Error says your dimensions are (1,600,960,4,3) which means you have a RGB-Image at each position. I'm assuming the net takes a RGBA-Image as input. Instead of stacking the frame 4 times, try adding just one dimension to the frame with only ones (alpha channel). This should give you an image with shape (960, 600, 4).

How to append numpy ndarray (images) to get similar dataset like mnist

enter code here
img = cv2.imread(f'resized_data/train/normal/IM-0115-0001.jpeg')
img2 = cv2.imread(f'resized_data/train/normal/IM-0117-0001.jpeg')
imgs = []
imgs.append(img)
imgs.append(img2)
imgs = np.array(imgs)
So I have two numpy.ndarray so far, with the shape of (256, 256, 3) each.
I append them to a list, which I will convert to a numpy ndarray later. When I call the imgs.shape function the shape is the following --> (2, ).
Why is the shape of the imgs array (2, ) and not (2, 256, 256, 3) ?
Thanks in advance.

What's the cleanest and most efficient way to pass two stereo images to a loss function in Keras?

First off, why am I using Keras? I'm trying to stay as high level as possible, which doesn't mean I'm scared of low-level Tensorflow; I just want to see how far I can go while keeping my code as simple and readable as possible.
I need my Keras model (custom-built using the Keras functional API) to read the left image from a stereo pair and minimize a loss function that needs to access both the right and left images. I want to store the data in a tf.data.Dataset.
What I tried:
Reading the dataset as (left image, right image), i.e. as tensors with shape ((W, H, 3), (W, H, 3)), then use function closure: define a keras_loss(left_images) that returns a loss(y_true, y_pred), with y_true being a tf.Tensor that holds the right image. The problem with this approach is that left_images is a tf.data.Dataset and Tensorflow complains (rightly so) that I'm trying to operate on a dataset instead of a tensor.
Reading the dataset as (left image, (left image, right image)), which should make y_true a tf.Tensor with shape ((W, H, 3), (W, H, 3)) that holds both the right and left images. The problem with this approach is that it...does not work and raises the following error:
ValueError: Error when checking model target: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), for inputs ['tf_op_layer_resize/ResizeBilinear']
but instead got the following list of 2 arrays: [<tf.Tensor 'args_1:0'
shape=(None, 512, 256, 3) dtype=float32>, <tf.Tensor 'args_2:0'
shape=(None, 512, 256, 3) dtype=float32>]...
So, is there anything I did not consider? I read the documentation and found nothing about what gets considered as y_pred and what as y_true, nor about how to convert a dataset into a tensor smartly and without loading it all in memory.
My model is designed as such:
def my_model(input_shape):
width = input_shape[0]
height = input_shape[1]
inputs = tf.keras.Input(shape=input_shape)
# < a few more layers >
outputs = tf.image.resize(tf.nn.sigmoid(tf.slice(disp6, [0, 0, 0, 0], [-1, -1, -1, 2])), tf.Variable([width, height]))
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
And my dataset is built as such (in case 2, while in case 1 only the function read_stereo_pair_from_line() changes):
def read_img_from_file(file_name):
img = tf.io.read_file(file_name)
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_png(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [args.input_width, args.input_height])
def read_stereo_pair_from_line(line):
split_line = tf.strings.split(line, ' ')
return read_img_from_file(split_line[0]), (read_img_from_file(split_line[0]), read_img_from_file(split_line[1]))
# Dataset loading
list_ds = tf.data.TextLineDataset('test/files.txt')
images_ds = list_ds.map(lambda x: read_stereo_pair_from_line(x))
images_ds = images_ds.batch(1)
Solved. I just needed to read the dataset as (left image, [left image, right image]) instead of (left image, (left image, right image)) i.e. make the second item a list and not a tuple. I can then access the images as input_r = y_true[:, 1, :, :] and input_l = y_true[:, 0, :, :]

Apply a function to each dimension of a 4d array, returning an 4d array in python

I load MNIST (test) dataset which is shaped as (10000, 28, 28, 1) (which mean 10000 images (grayscale 28x28 image)). I want to apply motion blur kernel on each of the images and get the output of also the same shape (10000, 28, 28, 1).
I tried with def, vectorize but it doesn't work as I expected.
It's running on python 3.6
x_test.shape
--> (numpy.ndarray) (10000, 28, 28, 1)
def blurize(x):
# kernel
k = np.array([[0,0,0,0,0,0,0.0013],
[0,0,0,0.0086,0.0574,0.1061,0.1165],
[0,0.0450,0.0938,0.1426,0.0938,0.0450,0],
[0.1165,0.1061,0.0574,0.0086,0,0,0],
[0.0013,0,0,0,0,0,0]])
return (ndimage.convolve(x.reshape(28,28), k, mode='constant', cval=0.0))
blurred = blurize(x_test)
plt.imshow(blurred[1], interpolation='none', cmap='gray')
plt.show()
Result:
ValueError: cannot reshape array of size 7840000 into shape (28,28)
If I tried with
blurred = blurize(x_test[1]). it works but only for the second image. Since I don't want to loop over the whole array by the x_test[i] and merge frames into the expected output array of (10000, 28, 28, 1) again.
Thanks.
You can squeeze the input array, broadcast the kernel and then reshape the output to match the initial dimensions:
ndimage.convolve(x.squeeze(), k[None, ...], mode='constant', cval=0.0)[..., None]

Categories

Resources