Reshaping image in CNN Image Classification

Reshaping image in CNN Image Classification - python

IMG_SIZE = 160 # 160x160
def formatting(image, label) :
image = tf.cast(image, tf.float32) #convert every pixels to be float32 type
image = (image / 127.5) - 1
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image, label`
i understand that image will have to be divided by 255.0 so they will have the range (0-1). But i'm confused why in this case the image is divided by 127.5 which is half of 255 and subtract by 1.

Dividing the image by 127.5 and subtracting 1 is a way to normalize the image data to the range of [-1, 1].
See also this answer: https://datascience.stackexchange.com/questions/54296/should-input-images-be-normalized-to-1-to-1-or-0-to-1

Related

UNET- resize image to size (256,256,1); could not broadcast input array from shape (256,256) into shape (256,256,1)

I'm trying to resize these images which are gray scaled images(img)
and mask images (mask) to input all the images into a Neural network
model.
I'm facing this error could not broadcast input array from
shape (256,256) into shape (256,256,1)
This training path(train_paths) has 15 images and their masks
for epoch in range(nepochs):
shuffle(train_paths) #shuffle the training paths to avoid overfitting
out_imgs= np.zeros((15,)+(256,256)+(1,)) #define the input images for the model
out_masks= np.zeros((15,)+(256,256)+(1,)) #define the input masks for the model
for i, img_mask_path in enumerate(train_paths):
img, mask = img_mask_path[i][0],img_mask_path[i][1]
#out_imgs = np.zeros((256,256))
img = cv2.resize(img, (256,256))
mask= cv2.resize(mask, (256,256))
#img.ravel()
out_imgs.reshape((15,256,256,1)).ravel()
out_imgs[i,...]= img #create single array of images
out_masks[i,...]= mask #create single array of masks

Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [128,128,4], [batch]: [128,128,3] [Op:IteratorGetNext]

I'm trying to load a dataset to tensorflow, preprocess it and then creating batches to feed to a gan but for some reason some of the images has 4 channels!! Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [128,128,4], [batch]: [128,128,3] [Op:IteratorGetNext]
this is the function to preprocess data and then adding them to batch
BATCH_SIZE = 32
def map_images(file):
img = tf.io.decode_jpeg(tf.io.read_file(file))
img = tf.dtypes.cast(img, tf.float32)
img = tf.image.resize(img, size=[128, 128])
img = img / 255.0
reimg = tf.reshape(img, [128, 128, 3])
return reimg
# create training batches
filename_dataset = tf.data.Dataset.list_files("/content/drive/MyDrive/Dataset/Damage type dataset/Damage type/Broken Glass/*.JPG")
image_dataset = filename_dataset.map(map_images).batch(BATCH_SIZE)

I solved it by making channels parameter = 3 in tf.io.decode_jpeg, since it's default is 0 so making it equal 3 will force all the images to have only 3 channels

How to resize image regions for CNN?

I am using AlexNet for object recognition. I have trained my model using images with size (277,277). Then have used Selective search algorithm to extract regions from image and feeding those regions to network for testing/prediction.
How ever when I resize image regions(from SelectiveSearch), it gives error.
Code For resizing Training Images:
try:
img_array = cv2.imread(os.path.join(path,img))
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
gray_img = cv2.cvtColor(new_array, cv2.COLOR_BGR2GRAY)
training_data.append([gray_img, class_num])
except Exception as e:
pass
code for resizing selected image regions:
img_lbl, regions = selectivesearch.selective_search(img, scale=500, sigma=0.4, min_size=10)
for r in regions:
x, y, w, h = r['rect']
segment = img[y:y + h, x:x + w]
gray_img = cv2.resize(segment, (277, 277))
gray_img = cv2.cvtColor(gray_img, cv2.COLOR_BGR2GRAY)
gray_img = np.array(gray_img).reshape(-1, 277, 277, 1)
gray_img = gray_img / 255.0
prediction = model.predict(gray_img)
it gives error on last line i.e:
prediction = model.predict(gray_img)
and error is:
Error: Error when checking input: expected conv2d_1_input to have shape (227, 227, 1) but got array with shape (277, 277, 1)
When both shapes are same then why it is giving this error.

Your model is expecting a tensor as an input, but you are trying to evaluate on a numpy array. Instead use placeholder of a given shape and then feed your array into this placeholder in a session.
# define a placeholder for input
image = tf.placeholder(dtype=tf.float32, name="image", shape=[277,277,1])
prediction = model.predict(image)
# evaluate each of your resized images in a session
with tf.Session() as sess:
for r in regions:
x, y, w, h = r['rect']
# rest of your code from the loop here
gray_img = gray_img /255.
p = sess.run(prediction, feed_dict={image: gray_img})
print(p) # to print the prediction of your model for this image
Maybe you should take a look at this question: What's the difference between tf.placeholder and tf.Variable?

load test data in pytorch

All is in the title, I just want to know, how can I load my own test data (image.jpg) in pytorch in order to test my CNN.

You need to feed images to net the same as in training: that is, you should apply exactly the same transformations to get similar results.
Assuming your net was trained using this code (or similar), you can see that an input image (for validation) undergoes the following transformations:
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
Following torchvision.transforms docs you can see that an input image goes through:
Resizing to 256x256 pix
Cropping 224x224 rect from the center of the image
The image is converted from uint8 datatype to float in range [0, 1], and transposed to 3-by-224-by-224 array
The image is normalize by subtracting mean and dividing by std.
You can do all this manually to any image
import numpy as np
from PIL import Image
pil_img = Image.open('image.jpg').resize((256, 256), Image.BILINEAR) # read and resize
# center crop
w, h = pil_img.size
i = int(round((h - 224) / 2.))
j = int(round((w - 224) / 2.))
pil_img = pil_img.crop((j, i, j+224, i+224))
np_img = np.array(pil_img).astype(np.float32) / 255.
np_img = np.transpose(np_img, (2, 0, 1))
# normalize
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
for c in range(3):
np_img = (np_img[c, ...] - mean[c]) / std[c]
Once you have np_img ready for your model, you can run a feed forward pass:
pred = model(np_img[None, ...]) # note that we add a singleton leading dim for batch

thanks for your response. My problem was loading test data and I found a solution.
test_data = datasets.ImageFolder('root/test_cnn', transform=transform)
For example if I have 2 directories cat & dog (in the test_cnn directory) that contains images, the Object ImageFolder will assign automatically the classes cat and dog to my images.
During testing, I have just to drop the classes.

Input to reshape is a tensor with 37632 values, but the requested shape has 150528

I have the same question:Input to reshape is a tensor with 37632 values, but the requested shape has 150528.
writer = tf.python_io.TFRecordWriter("/home/henson/Desktop/vgg/test.tfrecords") # 要生成的文件
for index, name in enumerate(classes):
class_path = cwd + name +'/'
for img_name in os.listdir(class_path):
img_path = class_path + img_name # 每一个图片的地址
img = Image.open(img_path)
img = img.resize((224, 224))
img_raw = img.tobytes() # 将图片转化为二进制格式
example = tf.train.Example(features=tf.train.Features(feature={
"label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
})) # example对象对label和image数据进行封装
writer.write(example.SerializeToString()) # 序列化为字符串
writer.close()
def read_and_decode(filename): # 读入dog_train.tfrecords
filename_queue = tf.train.string_input_producer([filename]) # 生成一个queue队列
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue) # 返回文件名和文件
features = tf.parse_single_example(serialized_example,
features={
'label': tf.FixedLenFeature([], tf.int64),
'img_raw': tf.FixedLenFeature([], tf.string),
}) # 将image数据和label取出来
img = tf.decode_raw(features['img_raw'], tf.uint8)
img = tf.reshape(img, [224, 224, 3]) # reshape为128*128的3通道图片
img = tf.cast(img, tf.float32) * (1. / 255) - 0.5 # 在流中抛出img张量
label = tf.cast(features['label'], tf.int32) # 在流中抛出label张量
print(img,label)
return img, label
images, labels = read_and_decode("/home/henson/Desktop/vgg/TFrecord.tfrecords")
print(images,labels)
images, labels = tf.train.shuffle_batch([images, labels], batch_size=20, capacity=16*20, min_after_dequeue=8*20)
I thonght I have resize img to 224*224,and reshape to [224,224,3],but it doesn't work. How could I make it?

The problem is basically related to shape of Architecture of CNN.Let say I defined architecture shown in picture int coding we defined weights and biases in following way
If we see (weights) Lets start with
wc1 in this layer I defined 32 filters of 3x3 size will be applied
wc2 in this layer I defined 64 filters of 3x3 size will be applied
wc3 in this layer I defined 128 filters of 3x3 size will be applied
wd1 38*38*128 is interesting (Where it comes from).
And in Architecture we also defined maxpooling concept.
See Architecture pic in every step
1.Lets Explain it Let say your input image is 300 x 300 x 1 (in picture it is 28x28x1)
2. (If strides defined is set to 1)Each filter will have an 300x300x1 picture so After applying 32 filter of 3x3
the we will have 32 pictures of 300x300 thus collected images will be 300x300x32
3.After Maxpooling if (Strides=2 depends what you defined usually it is 2) image size will change from 300 x 300 x 32 to 150 x 150 x 32
(If strides defined is set to 1)Now Each filter will have an 150x150x32 picture so After applying 64 filter of 3x3
the we will have 64 pictures of 300x300 thus collected images will be 150x150x(32x64)
5.After Maxpooling if (Strides=2 depends what you defined usually it is 2) image size will change from 150x150x(32x64)
to 75 x 75 x (32x64)
(If strides defined is set to 1)Now Each filter will have an 75 x 75 x (32x64) picture so After applying 64 filter of 3x3
the we will have 128 pictures of 75 x 75 x (32x64) thus collected images will be 75 x 75 x (32x64x128)
7.After Maxpooling since dimension of image is 75x75(odd dimension make it even) so it is needed to pad first (if padding defined ='Same') then it will change to 76x76(even) ** if (Strides=2 depends what you defined usually it is 2) image size will change from 76x76x(32x64x128)
to **38 x 38 x (32x64x128)
Now See 'wd1' in coding picture here comes 38*38*128

I had the same error , so changed my code from this:
image = tf.decode_raw(image_raw, tf.float32)
image = tf.reshape(image, [img_width, img_height, 3])
to this:
image = tf.decode_raw(image_raw, tf.uint8)
image = tf.reshape(image, [img_width, img_height, 3])
# The type is now uint8 but we need it to be float.
image = tf.cast(image, tf.float32)
It is because somehow there's a mismatch in my generate_tf_record data format. I serialized it to string instead of bytelist. I notice the difference you and me, you change your image to byte . Here's how I write my image to tfrecord.
file_path, label = sample
image = Image.open(file_path)
image = image.resize((224, 224))
image_raw = np.array(image).tostring()
features = {
'label': _int64_feature(class_map[label]),
'text_label': _bytes_feature(bytes(label, encoding = 'utf-8')),
'image': _bytes_feature(image_raw)
}
example = tf.train.Example(features=tf.train.Features(feature=features))
writer.write(example.SerializeToString())
hope it will help.

I had the same error just like you, and I found the reason behind it. It is because when you store your image with .tostring(), the data is stored with the format of tf.float32. Then you decode the tfrecord with decode_raw(tf.uint8), which causes the dismatch error.
I solved it by change the code to:
image=tf.decode_raw(image_raw,tf.float32)
or:
image=tf.image.decode_jpeg(image_raw,channels=3)
if you image_raw is jpeg format originally

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reshaping image in CNN Image Classification - python

Dividing the image by 127.5 and subtracting 1 is a way to normalize the image data to the range of [-1, 1]. See also this answer: https://datascience.stackexchange.com/questions/54296/should-input-images-be-normalized-to-1-to-1-or-0-to-1

Related

UNET- resize image to size (256,256,1); could not broadcast input array from shape (256,256) into shape (256,256,1)

Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [128,128,4], [batch]: [128,128,3] [Op:IteratorGetNext]

How to resize image regions for CNN?

load test data in pytorch

Input to reshape is a tensor with 37632 values, but the requested shape has 150528

Categories

Resources