The example below is extracted from the official TensorFlow tutorial on data pipelines. Basically, one resizes a bunch of JPGs to be (128, 128, 3). For some reason, when applying the map() operation, the colour dimension, namely 3, is turned into a None when examining the shape of the dataset. Why is that third dimension singled out? (I checked to see if there were any images that weren't (128, 128, 3) but didn't fid any.)
If anything, None should only show up for the very first dimension, i.e., that which counts the number of examples, and should not affect the individual dimensions of the examples, since---as nested structures---they're supposed to have the same shape anyway so as to be stored as tf.data.Datasets.
The code in TensorFlow 2.1 is
import pathlib
import tensorflow as tf
# Download the files.
flowers_root = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
flowers_root = pathlib.Path(flowers_root)
# Compile the list of files.
list_ds = tf.data.Dataset.list_files(str(flowers_root/'*/*'))
# Reshape the images.
# Reads an image from a file, decodes it into a dense tensor, and resizes it
# to a fixed shape.
def parse_image(filename):
parts = tf.strings.split(file_path, '\\') # Use the forward slash on Linux
label = parts[-2]
image = tf.io.read_file(filename)
image = tf.image.decode_jpeg(image)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [128, 128])
print("Image shape:", image.shape)
return image, label
print("Map the parse_image() on the first image only:")
file_path = next(iter(list_ds))
image, label = parse_image(file_path)
print("Map the parse_image() on the whole dataset:")
images_ds = list_ds.map(parse_image)
and yields
Map the parse_image() on the first image only:
Image shape: (128, 128, 3)
Map the parse_image() on the whole dataset:
Image shape: (128, 128, None)
Why None in that last line?
From the tutorial you are missing this part
for image, label in images_ds.take(5):
show(image, label)
The line
images_ds = list_ds.map(parse_image)
only creates a placeholder
and there is no image being passed to the function
if you put prints the file_path is blank
But if your use
for image, label in images_ds.take(5)
it iterates over each image passing it through the parse_image function.
Related
I would like to save the tensor image as a transparent image because I want to merge two images. I have tried different solutions but always there are tensor reshape problems. I am unable to do so. The shape of the tensor is torch.Size([1, 1, 256, 256])
from torchvision.utils import save_image
image = net_G(input_image)
# transform = T.ToPILImage()
# image = torch.squeeze(image, 0).shape
# img = transform(image)
save_image(img, full_output_dir+'/%s.jpg' % name, transparency=255)
Is it possbible to get the expected input shape from a 'model.h5' file?
I have two models for the same dataset but with different options and shapes. The first one expects a dim of (None, 64, 48, 1) and the seconds model need input shape (None, 128, 96, 3). (Note: The width or the height are not fixed and can change when I train again).
The channels problem was easy to "fix" (or bypass rather) by just using try: and except because there are only two options (1 for grayscale image and 3 for rgb image):
channels = self.df["channels"][0]
file = ""
try:
images, src_images, data = self.get_images()
images = self.preprocess_data(images, channels)
predictions, file = self.load_model(images, file)
self.predict_data(src_images, predictions, data)
except:
if channels == 1:
print("Except channels =", channels)
channels = 3
images, src_images, data = self.get_images()
images = self.preprocess_data(images, channels)
predictions = self.load_model(images, file)
self.predict_data(src_images, predictions, data)
else:
channels = 1
print("Except channels =", channels)
images, src_images, data = self.get_images()
images = self.preprocess_data(images, channels)
predictions = self.load_model(images, file)
self.predict_data(src_images, predictions, data)
This workaround however cannot be used for the width and height of an image because there basically unlimited amount of options.
Besides that it is rather slow because I read all the data twice and preprocess it twice for no reason.
Is there a way to load the model.h5 file and print the expected input shape in a form like this?:
[None, 128, 96, 3]
I finally found the answer myself.
config = model.get_config() # Returns pretty much every information about your model
print(config["layers"][0]["config"]["batch_input_shape"]) # returns a tuple of width, height and channels
This will output the following:
(None, 128, 96, 3)
I found the answer from here to be more concise:
model.layers[0].input_shape[0]
The way I understand it, it should be easier to deal with multiple inputs that way too.
First off, why am I using Keras? I'm trying to stay as high level as possible, which doesn't mean I'm scared of low-level Tensorflow; I just want to see how far I can go while keeping my code as simple and readable as possible.
I need my Keras model (custom-built using the Keras functional API) to read the left image from a stereo pair and minimize a loss function that needs to access both the right and left images. I want to store the data in a tf.data.Dataset.
What I tried:
Reading the dataset as (left image, right image), i.e. as tensors with shape ((W, H, 3), (W, H, 3)), then use function closure: define a keras_loss(left_images) that returns a loss(y_true, y_pred), with y_true being a tf.Tensor that holds the right image. The problem with this approach is that left_images is a tf.data.Dataset and Tensorflow complains (rightly so) that I'm trying to operate on a dataset instead of a tensor.
Reading the dataset as (left image, (left image, right image)), which should make y_true a tf.Tensor with shape ((W, H, 3), (W, H, 3)) that holds both the right and left images. The problem with this approach is that it...does not work and raises the following error:
ValueError: Error when checking model target: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), for inputs ['tf_op_layer_resize/ResizeBilinear']
but instead got the following list of 2 arrays: [<tf.Tensor 'args_1:0'
shape=(None, 512, 256, 3) dtype=float32>, <tf.Tensor 'args_2:0'
shape=(None, 512, 256, 3) dtype=float32>]...
So, is there anything I did not consider? I read the documentation and found nothing about what gets considered as y_pred and what as y_true, nor about how to convert a dataset into a tensor smartly and without loading it all in memory.
My model is designed as such:
def my_model(input_shape):
width = input_shape[0]
height = input_shape[1]
inputs = tf.keras.Input(shape=input_shape)
# < a few more layers >
outputs = tf.image.resize(tf.nn.sigmoid(tf.slice(disp6, [0, 0, 0, 0], [-1, -1, -1, 2])), tf.Variable([width, height]))
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
And my dataset is built as such (in case 2, while in case 1 only the function read_stereo_pair_from_line() changes):
def read_img_from_file(file_name):
img = tf.io.read_file(file_name)
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_png(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [args.input_width, args.input_height])
def read_stereo_pair_from_line(line):
split_line = tf.strings.split(line, ' ')
return read_img_from_file(split_line[0]), (read_img_from_file(split_line[0]), read_img_from_file(split_line[1]))
# Dataset loading
list_ds = tf.data.TextLineDataset('test/files.txt')
images_ds = list_ds.map(lambda x: read_stereo_pair_from_line(x))
images_ds = images_ds.batch(1)
Solved. I just needed to read the dataset as (left image, [left image, right image]) instead of (left image, (left image, right image)) i.e. make the second item a list and not a tuple. I can then access the images as input_r = y_true[:, 1, :, :] and input_l = y_true[:, 0, :, :]
After training my model, I tried to plot graph of the softmax output, but it resulted in the runtime error mentioned in the title.
Here is the following code snippet:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import helper
# Test out your network!
dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[1]
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
# Plot the image and probabilities
helper.view_classify(img, ps, version='Fashion')
The problem is with this part (I guess).
img = images[1]
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
Problem: image you are loading is of dimension 28x28, however, the first index in input to the model is generally batch size. Since there is 1 image only, so you have to make the first dimension to be of size 1. To do that do img = img.view( (-1,) + img.shape) or img=img.unsqueeze(dim=0). Also, it seems that the first layer weight is 784 x 128. i.e the image should be converted to vector and fed to model. For that we do img=img.view(1, -1).
So, in total, you need to do
img = images[1]
img = img.unsqueeze(dim=0)
img=img.view(1, -1)
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
or you can just use one command instead of two (unsqueeze is unnecessary)
img = images[1]
img=img.view(1, -1)
I have the same question:Input to reshape is a tensor with 37632 values, but the requested shape has 150528.
writer = tf.python_io.TFRecordWriter("/home/henson/Desktop/vgg/test.tfrecords") # 要生成的文件
for index, name in enumerate(classes):
class_path = cwd + name +'/'
for img_name in os.listdir(class_path):
img_path = class_path + img_name # 每一个图片的地址
img = Image.open(img_path)
img = img.resize((224, 224))
img_raw = img.tobytes() # 将图片转化为二进制格式
example = tf.train.Example(features=tf.train.Features(feature={
"label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
})) # example对象对label和image数据进行封装
writer.write(example.SerializeToString()) # 序列化为字符串
writer.close()
def read_and_decode(filename): # 读入dog_train.tfrecords
filename_queue = tf.train.string_input_producer([filename]) # 生成一个queue队列
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue) # 返回文件名和文件
features = tf.parse_single_example(serialized_example,
features={
'label': tf.FixedLenFeature([], tf.int64),
'img_raw': tf.FixedLenFeature([], tf.string),
}) # 将image数据和label取出来
img = tf.decode_raw(features['img_raw'], tf.uint8)
img = tf.reshape(img, [224, 224, 3]) # reshape为128*128的3通道图片
img = tf.cast(img, tf.float32) * (1. / 255) - 0.5 # 在流中抛出img张量
label = tf.cast(features['label'], tf.int32) # 在流中抛出label张量
print(img,label)
return img, label
images, labels = read_and_decode("/home/henson/Desktop/vgg/TFrecord.tfrecords")
print(images,labels)
images, labels = tf.train.shuffle_batch([images, labels], batch_size=20, capacity=16*20, min_after_dequeue=8*20)
I thonght I have resize img to 224*224,and reshape to [224,224,3],but it doesn't work. How could I make it?
The problem is basically related to shape of Architecture of CNN.Let say I defined architecture shown in picture int coding we defined weights and biases in following way
If we see (weights) Lets start with
wc1 in this layer I defined 32 filters of 3x3 size will be applied
wc2 in this layer I defined 64 filters of 3x3 size will be applied
wc3 in this layer I defined 128 filters of 3x3 size will be applied
wd1 38*38*128 is interesting (Where it comes from).
And in Architecture we also defined maxpooling concept.
See Architecture pic in every step
1.Lets Explain it Let say your input image is 300 x 300 x 1 (in picture it is 28x28x1)
2. (If strides defined is set to 1)Each filter will have an 300x300x1 picture so After applying 32 filter of 3x3
the we will have 32 pictures of 300x300 thus collected images will be 300x300x32
3.After Maxpooling if (Strides=2 depends what you defined usually it is 2) image size will change from 300 x 300 x 32 to 150 x 150 x 32
(If strides defined is set to 1)Now Each filter will have an 150x150x32 picture so After applying 64 filter of 3x3
the we will have 64 pictures of 300x300 thus collected images will be 150x150x(32x64)
5.After Maxpooling if (Strides=2 depends what you defined usually it is 2) image size will change from 150x150x(32x64)
to 75 x 75 x (32x64)
(If strides defined is set to 1)Now Each filter will have an 75 x 75 x (32x64) picture so After applying 64 filter of 3x3
the we will have 128 pictures of 75 x 75 x (32x64) thus collected images will be 75 x 75 x (32x64x128)
7.After Maxpooling since dimension of image is 75x75(odd dimension make it even) so it is needed to pad first (if padding defined ='Same') then it will change to 76x76(even) ** if (Strides=2 depends what you defined usually it is 2) image size will change from 76x76x(32x64x128)
to **38 x 38 x (32x64x128)
Now See 'wd1' in coding picture here comes 38*38*128
I had the same error , so changed my code from this:
image = tf.decode_raw(image_raw, tf.float32)
image = tf.reshape(image, [img_width, img_height, 3])
to this:
image = tf.decode_raw(image_raw, tf.uint8)
image = tf.reshape(image, [img_width, img_height, 3])
# The type is now uint8 but we need it to be float.
image = tf.cast(image, tf.float32)
It is because somehow there's a mismatch in my generate_tf_record data format. I serialized it to string instead of bytelist. I notice the difference you and me, you change your image to byte . Here's how I write my image to tfrecord.
file_path, label = sample
image = Image.open(file_path)
image = image.resize((224, 224))
image_raw = np.array(image).tostring()
features = {
'label': _int64_feature(class_map[label]),
'text_label': _bytes_feature(bytes(label, encoding = 'utf-8')),
'image': _bytes_feature(image_raw)
}
example = tf.train.Example(features=tf.train.Features(feature=features))
writer.write(example.SerializeToString())
hope it will help.
I had the same error just like you, and I found the reason behind it. It is because when you store your image with .tostring(), the data is stored with the format of tf.float32. Then you decode the tfrecord with decode_raw(tf.uint8), which causes the dismatch error.
I solved it by change the code to:
image=tf.decode_raw(image_raw,tf.float32)
or:
image=tf.image.decode_jpeg(image_raw,channels=3)
if you image_raw is jpeg format originally