How do I create image sequence samples using - python

I want to create image sequence samples using the API. But as of now, it seems like there is no easy way to concatenate multiple images to form a single sample. I have tried to use the dataset.window function, which groups my images right. But I don't know how to concatenate them.
import tensorflow as tf
from glob import glob
def load_and_process_image(path):
img =
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize(img, [IMG_WIDTH, IMG_HEIGHT])
img = tf.reshape(img, shape=(IMG_WIDTH, IMG_HEIGHT, 1, 3))
return img
def create_dataset(files, time_distance=8, frame_step=1):
dataset =
dataset =
dataset = dataset.window(time_distance, 1, frame_step, True)
# TODO: Concatenate elements from dataset.window
return dataset
files = sorted(glob('some/path/*.jpg'))
images = create_dataset(images)
I know that I could save my image sequences as TFRecords but that would make my data pipeline much more unflexible and would cost tons of memory.
My input batches should have the form N x W x H x T x C
(N: Number of samples
W: Image Width
H: Image Height
T: Image Sequence length
C: Image Channels).

You can use batching to create batches of size N.
iterations = #
batched_dataset = dataset.batch(N)
for batch in batched_dataset.take(iterations):
# process your batch
Here iterations is the number of batches you want to generate.


image captioner generator method from single image to batch

i was following the tensorflow guide on image captioning linked here and everything is working great but i wanted to to convert this method that generates captions for input image to take a batch of images instead of 1
for example this the current generator method
def simple_gen(self, image, temperature=1):
initial = self.word_to_index([['[ٍSTART]']]) # (batch, sequence)
img_features = self.feature_extractor(image[tf.newaxis, ...])
tokens = initial # (batch, sequence)
for n in range(50):
preds = self((img_features, tokens)).numpy() # (batch, sequence, vocab)
preds = preds[:,-1, :] #(batch, vocab)
if temperature==0:
next = tf.argmax(preds, axis=-1)[:, tf.newaxis] # (batch, 1)
next = tf.random.categorical(preds/temperature, num_samples=1) # (batch, 1)
tokens = tf.concat([tokens, next], axis=1) # (batch, sequence)
if next[0] == self.word_to_index('[END]'):
words = idx_to_word(tokens[0, 1:-1])
result = tf.strings.reduce_join(words, axis=-1, separator=' ')
return result.numpy().decode()
it takes one image output loaded by this function
def load_img(img_path):
img =
img =,channels=3)
img = tf.image.resize(img,IMAGE_SHAPE[:-1])
return img
and load_img function takes img_path and the generator function returns generated caption for this image
what i tried is i have a tf dataset that contains a list img paths and corresponding captions i tried the following code to load all images in the tf dataset and loop over them and call the simple_gen method but it's very slow and inefficient and i'm looking for a better way to optimize the method
for (img,capt) in img,capt: (load_img(img),capt)):
preds = []
for t in [0.0,0.5,1.0]:
result = model.simple_gen(img)

How can I properly get my Dataset to create?

I have the following code:
imagepaths = tf.convert_to_tensor(imagepaths, dtype=tf.string)
labels = tf.convert_to_tensor(labels, dtype=tf.int32)
# Build a TF Queue, shuffle data
image, label =, labels))
and am getting the following error:
image, label =, labels))
ValueError: too many values to unpack (expected 2)
Shouldn't Dataset.from_tensor_slices see this as the length of the tensor, not the number of inputs? How can I fix this issue or combine the data tensors into the same variable more effectively?
Just for reference:
There are 1800 imagepaths and 1800 labels corresponding to each other. And to be clear, the imagepaths are paths to the files where the jpgs images are located. My goal after this is to shuffle the data set and build the neural network model.
That code is right here:
# Read images from disk
image = tf.read_file(image)
image = tf.image.decode_jpeg(image, channels=CHANNELS)
# Resize images to a common size
image = tf.image.resize_images(image, [IMG_HEIGHT, IMG_WIDTH])
# Normalize
image = image * 1.0/127.5 - 1.0
# Create batches
X, Y = tf.train.batch([image, label], batch_size=batch_size,
capacity=batch_size * 8,
try to do this:
def transform(entry):
img = entry[0]
lbl = entry[1]
return img, lbl
raw_data = list(zip(imagepaths, labels))
dataset =
dataset =
and if you want to have a look at your dataset you can do it like this:
for e in dataset.take(1):
you can add multiple map functions and you can after that use shuffle and batch on your dataset to prepare it for training ;)

Why would this dataset implementation run out of memory?

I follow this instruction and write the following code to create a Dataset for images(COCO2014 training set)
from pathlib import Path
import tensorflow as tf
def image_dataset(filepath, image_size, batch_size, norm=True):
def preprocess_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, image_size)
if norm:
image /= 255.0 # normalize to [0,1] range
return image
def load_and_preprocess_image(path):
image = tf.read_file(path)
return preprocess_image(image)
all_image_paths = [str(f) for f in Path(filepath).glob('*')]
path_ds =
ds =,
ds = ds.shuffle(buffer_size = len(all_image_paths))
ds = ds.repeat()
ds = ds.batch(batch_size)
ds = ds.prefetch(
return ds
ds = image_dataset(train2014_dir, (256, 256), 4, False)
image = ds.make_one_shot_iterator().get_next('images')
# image is then fed to the network
This code will always run out of both memory(32G) and GPU(11G) and kill the process. Here is the messages shown on terminal.
I also spot that the program get stuck at Where is wrong? How can I fix it?
The problem is this:
ds = ds.shuffle(buffer_size = len(all_image_paths))
The buffer that Dataset.shuffle() uses is an 'in memory' buffer so you are effectively trying to load the whole dataset in memory.
You have a couple of options (which you can combine) to fix this:
Option 1:
Reduce the buffer size to a much smaller number.
Option 2:
Move the shuffle() statment before the map() statement.
This means we would be shuffling before we load the images therefore we'd just be storing the filenames in the memory buffer for the shuffle rather than storing huge tensors.

valueerror: can't reshape array of size 315 into shape (32,32)

I'm trying to use the code in this page:
import cv2
import glob
import numpy as np
#Train data
train = []
train_labels = []
files = glob.glob (r"C:\Users\Downloads\All_Codes\image\0\*.png") # your image path
for myFile in files:
image = cv2.imread (myFile ,cv2.IMREAD_GRAYSCALE)
train.append (input_img_resize)
files = glob.glob (r"C:\Users\Downloads\All_Codes\image\1\*.png")
for myFile in files:
image = cv2.imread (myFile,cv2.IMREAD_GRAYSCALE)
train.append (image)
train = np.array(train,dtype=object) #as mnist
train_labels = np.array(train_labels,dtype=object) #as mnist
# convert (number of images x height x width x number of channels) to (number of images x (height * width *3))
# for example (120 * 40 * 40 * 3)-> (120 * 4800)
train = np.reshape(train,(train.shape[0],64,64))
# save numpy array as .npy formats'train',train)'train_labels',train_labels)
But I had some errors. The problem is that I get the same error every time I attempt to read my images and reshaping them using np.reshape. I searched a lot and used so many codes. They are all the same. That I can't shape (the number of images in my dataset) to (32, 32) which is the shape I want to insert to my CNN model. The only thing I know for sure is the images in my dataset are of different shapes. Is this why I'm having a diffculty in reshaping them? then what's the point of using "resize" and "reshape"?
the first error is:
ValueError: cannot reshape array of size 315 into shape (315,32,32)
for this line:
train = np.reshape(train,[train.shape[0],32,32])
So, I solved the problem.
import cv2
import glob
import numpy as np
import PIL.Image
#Train data
train = []
train_labels = []
files = glob.glob (r"\train\0\*.png") # your image path
for myFile in files:
image = cv2.imread (myFile ,cv2.IMREAD_GRAYSCALE)
train.append (input_img_resize)
files = glob.glob (r"\train\1\*.png")
for myFile in files:
image = cv2.imread (myFile,cv2.IMREAD_GRAYSCALE)
train.append (input_img_resize)
train = np.array(train,dtype="float32") #as mnist
train_labels = np.array(train_labels,dtype="float32") #as mnist
train = np.reshape(train,(-1,64,64,1))
I resized my images using cv2.resize (inside the loop)
Then did a reshape using np.reshape.
If I relied on one of them, it dose not work. I have to add them both.
The output is:
315 #len for x and y
(315, 64, 64) #after cv2.resize
(315, 1)
(315, 64, 64, 1) #after np.reshape
(315, 1)

resize_images with a batch

I'm trying to read three jpg-Files to resize them with a tensorflow batch. No matter what I tried I didn't succeed. Here is one example below. In general how can I resize some pictures in a batch with tf.image.resize_images. I don't want to use an Input Reader. I want to create the batch of some pictures by myself.
I think it's neccessary to have 4 dimensions like batchsize, width, heigt, channels
import numpy as np
import tensorflow as tf
sess = tf.Session()
tensor_list = []
for i in range(3):
img = tf.read_file("{0}.jpg".format(i))
img_tensor = tf.image.decode_jpeg(img, 3)
img_resized = tf.image.resize_images(img_tensor, tf.convert_to_tensor([ 800, 400 ] ), tf.image.ResizeMethod.NEAREST_NEIGHBOR)
img_tensor_dim = tf.expand_dims(img_resized, 0)
batch = tf.train.batch(tensor_list, batch_size=3, enqueue_many=False)
img_resized = tf.image.resize_images(batch, tf.convert_to_tensor([400, 200]), tf.image.ResizeMethod.NEAREST_NEIGHBOR)
for i in range(3):
tmp = img_resized[i]
endcode_jpg = tf.image.encode_jpeg(tmp, x_density=96, y_density=96)
wr = tf.write_file('{0}_out.jpg'.format(i), endcode_jpg)
You can use the tf.map_fn() operation to apply the resizing logic to a vector of strings containing your image data:
import tensorflow as tf
# Build a tensor containing the image data as a vector of strings.
images = []
for i in range(3):
images = tf.stack(images)
# `resize_fn()` contains the logic for resizing and encoding one image.
def resize_fn(img):
img_tensor = tf.image.decode_jpeg(img, 3)
img_resized = tf.image.resize_images(
[img_tensor], [800, 400], tf.image.ResizeMethod.NEAREST_NEIGHBOR)[0]
img_encoded = tf.image.encode_jpeg(img_resized, x_density=96, y_density=96)
return img_encoded
# `tf.map_fn()` applies `resize_fn()` to each image in turn, and
# returns a vector of encoded images.
encoded_images = tf.map_fn(resize_fn, images)
write_ops = []
for i in range(3):
write_ops.append(tf.write_file("{0}_out.jpg".format(i), encoded_images[i]))
with tf.Session() as sess:

