I'm using a set of 32x32x32 grayscale images and I want to apply random rotations on the images as a part of data augmentation while training a CNN by tflearn + tensorflow. I was using the following code to do so:
# Real-time data preprocessing
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()
# Real-time data augmentation
img_aug = ImageAugmentation()
img_aug.add_random_rotation(max_angle=360.)
# Input data
with tf.name_scope('Input'):
X = tf.placeholder(tf.float32, shape=(None, image_size,
image_size, image_size, num_channels), name='x-input')
Y = tf.placeholder(tf.float32, shape=(None, label_cnt), name='y-input')
# Convolutional network building
network = input_data(shape=[None, 32, 32, 32, 1],
placeholder = X,
data_preprocessing=img_prep,
data_augmentation=img_aug)
(I'm using a combination of tensorflow and tflearn to be able to use the features from both, so please bear with me. Let me know if something is wrong with the way I'm using placeholders, etc.)
I found that using the add_random_rotation (which itself uses scipy.ndimage.interpolation.rotate) treats the third dimension of my grayscale images as channels (like RGB channels) and rotates all 32 images of the third dimension by a random angel around z-axis(treats my 3D image as a 2D image with 32 channels). But I want the image to be rotated in the space (around all three axes). Do you have any idea how can I do that? Is there a function or package for easily rotating the 3D images in space?!
def random_rotation_3d(batch, max_angle):
""" Randomly rotate an image by a random angle (-max_angle, max_angle).
Arguments:
max_angle: `float`. The maximum rotation angle.
Returns:
batch of rotated 3D images
"""
size = batch.shape
batch = np.squeeze(batch)
batch_rot = np.zeros(batch.shape)
for i in range(batch.shape[0]):
if bool(random.getrandbits(1)):
image1 = np.squeeze(batch[i])
# rotate along z-axis
angle = random.uniform(-max_angle, max_angle)
image2 = scipy.ndimage.interpolation.rotate(image1, angle, mode='nearest', axes=(0, 1), reshape=False)
# rotate along y-axis
angle = random.uniform(-max_angle, max_angle)
image3 = scipy.ndimage.interpolation.rotate(image2, angle, mode='nearest', axes=(0, 2), reshape=False)
# rotate along x-axis
angle = random.uniform(-max_angle, max_angle)
batch_rot[i] = scipy.ndimage.interpolation.rotate(image3, angle, mode='nearest', axes=(1, 2), reshape=False)
# print(i)
else:
batch_rot[i] = batch[i]
return batch_rot.reshape(size)
It is more difficult to incorporate in the ImageAugmentation() but the scipy.ndimage.rotate function by default rotates 3D images correctly and takes the axes argument which specifies which specify the plane of rotation (https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.ndimage.interpolation.rotate.html). Rotating around the first axis (x) means you pass axes=(1,2), to rotate around the second axis (y) useaxes=(0,2)
If you want to rotate any 3D image around the center and keep it in the center use scipy affine_transform using offset as follows:
# create a 3D image
image = np.random.random((20,20,20))
# output shape
output_shape = np.array(image.shape)
# rotation matrix around z axis
theta = 0.01
cosine = np.cos(theta)
sinus = np.sin(theta)
M = np.array([[cosine, -sinus, 0],
[sinus, cosine, 0],
[0, 0, 1]])
# offset
offset = (np.array(image.shape)-M.dot(np.array(output_shape)))
offset = offset/2.0 # it is important
# affine transformation
f_data = affine_transform(np.asarray(image), np.asarray(M),
output_shape=output_shape, offset=offset)
Related
I'm trying to generate a 3D images using stack of 2D grayscale images in python. I currently have the images, mask, and mask output. I tried creating an ndarray by adding an axis to my images but this didn't seems to work.
This is what I wrote:
# load images
images_gray = []
#x, y= images[0].shape
#z= len(frames)
#threeD= np.ndarray([x,y,z]) #3D
threeD=[]
for i in range(len(images)):
frame= cv2.imread(path+'/images/' + str(i))
#convert to grayscale then save
images_gray.append(rgb2gray(frame))
#create a polygon
coordinates=coord[i]
coordinates = [[y,x] for [x,y] in coordinates] #change order for polygon2mask
polygon = np.array(coordinates)
#create a mask
mask= polygon2mask(images_gray[i].shape, polygon)
#apply mask
result=ma.masked_array(images_gray[i], np.invert(mask))
temp=result[... ,np.newaxis]
threeD.append(temp)
The resulted output shape for threeD is (#of frames, image hight, image width, 1). I don't know where the 1 come from, and I also expected the order to be (x, y, z) = (image hight, image width, #of frames). The output is wrong and I wasn't able to view it using plt as I got type error saying invalid shape.
For the z, I thought about setting a value of 0.1 that would represent the thickness, not sure how to set that up.
I'm also not sure if my approach is correct or not; do I have to create a points clouds instead? mesh? any suggestions?
I am going to move the image for 1 or 2 pixels, as I specified a small number (1.25 , 1.9) in the affine matrix.
BUT, the image is moved far far away, like hundreds of pixels:
( my input image is fully filled with yellow pineapples)
Below is a working example.
import torch
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import torch.nn.functional as F
rotation_simple = np.array([[1,0, 1.25],
[ 0,1, 1.9]])
#load image
transform = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor()])
dataloader = torch.utils.data.DataLoader(datasets.ImageFolder('/home/Pictures',transform=transform,), shuffle=True)
dtype = torch.FloatTensor
i = 0
while i<3:
img, labels = next(iter(dataloader))
img = img#.double() # 有时候要转为double有时候不用转
rotation_simple = torch.as_tensor(rotation_simple)[None]
grid = F.affine_grid(rotation_simple, img.size()).type(dtype)
x = F.grid_sample(img, grid)
plt.imshow(x[0].permute(1, 2, 0))
plt.show()
i+=1
I wonder why does the function move the the image so far away instead of moving it for just 1 pixel in x and y direction.
Ps. Setting "align_corners=True" didn't help for this case.
Pps. My pytorch version is 1.4.0+cu100
The "unit of measures" for the grid and the affine transformation are not pixels, but rather normalized coordinates:
grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. For example, values x = -1, y = -1 is the left-top pixel of input, and values x = 1, y = 1 is the right-bottom pixel of input.
Therefore, translating by [1.25, 1.9] is actually translating by almost the entire image size. You need to divide the translation values by 2*img.shape to get pixel-wise translations.
See the doc for grid_sample for more information.
I want to make an affine transformation and afterwards use nearest neighbor interpolation while keeping the same dimensions for input and output images. I use for example the scaling transformation T= [[2,0,0],[0,2,0],[0,0,1]]. Any idea how can I fill the black pixels with nearest neighbor ? I tryied giving them the min value of neighbors' intensities. For ex. if a pixel has neighbors [55,22,44,11,22,55,23,231], I give it the value of min intensity: 11. But the result is not anything clear..
import numpy as np
from matplotlib import pyplot as plt
#Importing the original image and init the output image
img = plt.imread('/home/left/Desktop/computerVision/SET1/brain0030slice150_101x101.png',0)
outImg = np.zeros_like(img)
# Dimensions of the input image and output image (the same dimensions)
(width , height) = (img.shape[0], img.shape[1])
# Initialize the transformation matrix
T = np.array([[2,0,0], [0,2,0], [0,0,1]])
# Make an array with input image (x,y) coordinations and add [0 0 ... 1] row
coords = np.indices((width, height), 'uint8').reshape(2, -1)
coords = np.vstack((coords, np.zeros(coords.shape[1], 'uint8')))
output = T # coords
# Arrays of x and y coordinations of the output image within the image dimensions
x_array, y_array = output[0] ,output[1]
indices = np.where((x_array >= 0) & (x_array < width) & (y_array >= 0) & (y_array < height))
# Final coordinations of the output image
fx, fy = x_array[indices], y_array[indices]
# Final output image after the affine transformation
outImg[fx, fy] = img[fx, fy]
The input image is:
The output image after scaling is:
well you could simply use the opencv resize function
import cv2
new_image = cv2.resize(image, new_dim, interpolation=cv.INTER_AREA)
it'll do the resize and fill in the empty pixels in one go
more on cv2.resize
If you need to do it manually, then you could simply detect dark pixels in resized image and change their value to mean of 4 neighbour pixels (for example - it depends on your required alghoritm)
See: nereast neighbour, bilinear, bicubic, etc.
So i have preprocessed some dicom images to feed a neural network, and in image augmentation step, the image data generator expects a 4d input while my data is 3d (200, 420, 420)
i tried reshaping the array and expanding dimensions, but in both cases i cannot plot the individual images in the array (expects image with shape 420, 420 and instead my new images have shape 420, 420, 1)
and here are my codes;
I have three functions to convert DICOM images into images with good contrast;
This one takes housefield units
def transform_to_hu(medical_image, image):
intercept = medical_image.RescaleIntercept
slope = medical_image.RescaleSlope
hu_image = image * slope + intercept
return hu_image
This one sets window image values;
def window_image(image, window_center, window_width):
img_min = window_center - window_width // 2
img_max = window_center + window_width // 2
window_image = image.copy()
window_image[window_image < img_min] = img_min
window_image[window_image > img_max] = img_max
return window_image
And this function loads the image:
def load_image(file_path):
medical_image = dicom.read_file(file_path)
image = medical_image.pixel_array
hu_image = transform_to_hu(medical_image, image)
brain_image = window_image(hu_image, 40, 80)
return brain_image
Then i load my images:
files = sorted(glob.glob('F:\CT_Data_Classifier\*.dcm'))
images = np.array([load_image(path) for path in files])
images.shape returns (200, 512, 512)
and everything is fine about the data, for example i can plot 100th image by
plt.imshow(images[100]) and it plots an image
i then feed the data into image data generator
train_image_data = ImageDataGenerator(
rescale=1./255,
shear_range=0.,
zoom_range=0.05,
rotation_range=180,
width_shift_range=0.05,
height_shift_range=0.05,
horizontal_flip=True,
vertical_flip=True,
fill_mode='constant',
cval=0
but then, when i try to plot, with this code:
plt.figure(figsize=(12, 12))
for X_batch, y_batch in train_image_data.flow(trainX, trainY, batch_size=9):
for i in range(0, 9):
plt.subplot(330 + 1 + i)
plt.imshow(X_batch[i])
plt.show()
break
it returns
(ValueError: ('Input data in "NumpyArrayIterator" should have rank 4. You passed an array with shape', (162, 420, 420)))
i tried expand_dims and reshape to add an extra dimension at the end of the array to represent channels
but then it returns
TypeError: Invalid shape (420, 420, 1) for image data
in the plt.imshow stage
im a doctor and not an experienced programmer, so i would really appreciate your help. cheers.
You are correct in adding an extra dimension to represent channels. That part seems fine. The problem is with plotting. For that, you can use:
plt.matshow(x[..., 0]).
where x is the 3D array. The syntax x[..., 0] means take index 0 of the last dimension of array x. The ellipsis (...) is shorthand to fill in the dimensions. For a 3D array, the equivalent call would be x[:, :, 0].
I have an image and 3 points. I want to rotate the image and the points together. To this end, I rotate the image by some angle a and the points by the same angle.
When a is fixed to a python scalar (say pi/3), the rotation works fine (cf. image below, the blue dots are on the dark squares).
When the angle is randomly chosen with angle = tf.random_uniform([]), there is an offset between the rotated image and the rotated points.
Below is a the full code reproducing this behaviour.
My question is: how to explain this behaviour and correct it?
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# create toy image
square = np.zeros((1, 800, 800, 3))
square[:, 100:400, 100:400] = 1
square[:, 140:180, 140:180] = 0
square[:, 240:280, 240:280] = 0
square[:, 280:320, 280:320] = 0
kp = np.array([[160, 160], [260, 260], [300, 300]])
kp = np.expand_dims(kp, axis=0)
def _rotate(image, keypoints, angle, keypoints_num):
image = tf.contrib.image.rotate(image, angle)
cos, sin = tf.cos(angle), tf.sin(angle)
x0, y0 = .5, .5
rot_mat = tf.Variable([[cos, -sin], [sin, cos]], trainable=False)
keypoints -= (x0, y0)
keypoints = tf.reshape(keypoints, shape=[-1, 2])
keypoints = tf.matmul(keypoints, rot_mat)
keypoints = tf.reshape(keypoints, shape=[-1, keypoints_num, 2])
keypoints += (x0, y0)
return image, keypoints
image = tf.placeholder(tf.float32, [None, 800, 800, 3])
keypoints = tf.placeholder(tf.float32, [None, 3, 2])
angle = np.pi / 3 # fix angle, works fine
#angle = tf.random_uniform([]) # random angle, does not work
image_r, keypoints_r = _rotate(image, keypoints / 800, angle, 3)
keypoints_r *= 800
sess = tf.Session()
sess.run(tf.initialize_all_variables())
imr, kr = sess.run([image_r, keypoints_r], feed_dict={image: square, keypoints:kp})
# displaying output
plt.imshow(imr[0])
plt.scatter(*zip(*kr[0]))
plt.savefig('rotation.jpg')
The problem is here:
rot_mat = tf.Variable([[cos, -sin], [sin, cos]], trainable=False)
Since rot_mat is a variable, its value is being set only when variables are initialized, here:
sess.run(tf.initialize_all_variables())
So at that point rot_mat gets some value (using cos and sin, which in turn depend on angle, which is random) and it does not change anymore. Then when you do:
imr, kr = sess.run([image_r, keypoints_r], feed_dict={image: squares, keypoints:kps})
It is a different call to run, so tf.random_uniform produces a new value, but rot_mat still keeps the same value from when it was initialized. Since the image is rotated with:
image = tf.contrib.image.rotate(image, angle)
And the key points are rotated with:
keypoints = tf.matmul(keypoints, rot_mat)
The rotations do not match. The easiest fix is not to use a variable for rot_mat:
rot_mat = [[cos, -sin], [sin, cos]]
With this, the code works fine. If you really need rot_mat to be a variable, it is possible, but it is a bit more of work and it does not seem to be needed here. If you do not like rot_mat being a list and want to have a proper tensor instead, you can use tf.convert_to_tensor:
rot_mat = tf.convert_to_tensor([[cos, -sin], [sin, cos]])