I´m currently working on normalizing ct-scans (x, y, layer). Normalizing the first two dimensions is simple using cv2.reshape, but the third dimension... My idea is to flatten the first two dimensions to get a 2d-numpy-array. If I do the reshape to (x * y) for each layer and reshape it back to (x, y) I get a completely different image. I have a img of a lung in the beginning and lines of different gray values afterwords.
test = cv2.resize(img, (img.shape[0] * img.shape[1], 1), interpolation=cv2.INTER_LINEAR)
test = cv2.resize(test, (159, 159), interpolation=cv2.INTER_LINEAR)
self.print_prediction(test, cv2.resize(temp2_masks[:, 0], (159, 159)),
color=False, shape=(159, 159))
I'm sure it's some kind of simple mistake, but I don't see it. So I would be very grateful for help.
The cv2.resize function does not reshape your array.
It actually resizes the image. Your first line is squashing your image horizontally while expanding it a lot vertically. The values are not preserved at all.
Use numpy.reshape to reshape your arrays instead.
Related
I created a mask which has the shape (128, 128, 128). I then got the np.sum on that mask n_voxels_flattened = np.sum(mask) (removed the zero voxels so that I can do the transformation on non-zeros voxels) where now the n_voxels_flattened=962517. Then iterated over all images (300 images) which resulted in an array with shape (962517, 300). I did some adjustments to this array and the output has the same shape as input:(962517, 300). I now want to reshape this array and put back the zero voxels I removed so that it has the shape (128,128,128,300).
This is what I tried, but it resulted in a weird looking image when visualized.
zero_array = np.zeros((128*128*128 * 300))
zero_array[:len(result.reshape(-1))]=result.reshape(-1)
Any help would be much appreciated.
I'm training a Yolo model by using cv2.dnn and blobFromImage. I have a df with all the images paths, which i iterate over, to obtain the features, through blobFromImage. So far, I have this:
for i in df.iloc:
img = cv2.imread(str(i[8]))
height, width, shape = img.shape
blob = cv2.dnn.blobFromImage(img, 1/255, (416,416), (0,0,0), True, crop = False) # extract features. Normalize and resize. Swap RGB colours
print(blob.shape)
net = cv2.dnn.readNet(path_cfg, path_weights)
layer_names = net.getLayerNames()
outputlayers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
net.setInput(blob)
outs = net.forward(outputlayers)
All my images are of shape (1024, 1024, 3). When I pass the df into the code, blob.shape is (1,3,416,416) in the majority of cases. However, for some images, it reshapes to other size, such as (1,3,814,450). The interesting thing is that if I create a df1 with that specific image path and pass it into the loop, the shape of the blob turns out correctly to (1,3,416,416). Therefore, I'm assuming that it takes some values from the previously passed images.
I would highly appreciate any help which would explain why this is happening and how to solve, so that all blobs are of shape (1,3,416,416).
Many thanks in advance
I expect all blobs to have (1,3,416,416) shape. Some turn out to be different, although all original images are of the same shape.
I'm trying to use the "compare_ssim" function. I currently have two 2xN matrices of x,y coordinates where the first row is all the x coordinates and the second row is all the y coordinates of each of the two images. How can I calculate the SSIM for these two images (if there is a way to do so)
For example I have:
X = np.array([[1,2,3], [4,5,6]])
Y = np.array([[3,4,5],[5,6,7]])
compare_ssim(X,Y)
But I am getting the error
ValueError: win_size exceeds image extent. If the input is a multichannel (color) image, set multichannel=True.
I'm not sure if I am missing a parameter or if I should convert the matrices in such a way that this function works. Or if there is a way that I am supposed to convert my coordinates to a grayscale matrix? I'm a bit confused on what the matrices for the parameters of the function should look like. I know that they are supposed to be ndarrays but the type(Y) and type(Y) are both numpy.ndarray.
Since you haven't mentioned which framework/library you are using, I am going with the assumption that you are using skimage's compare_ssim.
The error in question is due to the shape of your inputs. You can find more details here.
TL;DR: compare_ssim expects images in (H, W, C) dimensions but your input images have a dimension of (2, 3). So the function is confused which dimension to treat as the channel dimension. When multichannel=True, the last dimension is treated as the channel dimension.
There are 3 key problems with your code,
compare_image expects Images as input. So your X and Y matrices should be of the dimensions (H, W, C) and not (2, 3)
They should of float datatype.
Below I have shown a bit of demo code (note: Since skimage v1.7, compare_ssim has been moved to skimage.metrics.structural_similarity)
import numpy as np
from skimage.metrics import structural_similarity
img1 = np.random.randint(0, 255, size=(200, 200, 3)).astype(np.float32)
img2 = np.random.randint(0, 255, size=(200, 200, 3)).astype(np.float32)
ssim_score = structural_similarity(img1, img2, multichannel=True) #score: 0.0018769083894301646
ssim_score = structural_similarity(img1, img1, multichannel=True) #score: 1.0
I'm working with GANs on the Single Image Super-Resolution (SISR) problem at 4x scaling. I am using residual learning techniques, so what I get back from the trained network is a tensor containing the estimated residual image between the upscaled input image and the target image. I feed the network with normalized numpy arrays representing the images (np.asarray(image) / 255).
In order to get the final estimate image, then, I have to sum the upscaled input image with the residual image. Here is the code I use (input image's size is 64x64 while the output has size 256x256):
net.eval()
img = Image.open(image_folder + 'lr/' + image_name)
tens = transforms.ToTensor()
toimg = transforms.ToPILImage()
input = tens(img)
bicub_res = tens(img.resize((img.size[0] * 4, img.size[1] * 4), Image.BICUBIC))
input = input.view((1, 3, 64, 64))
output = net(input)
output = torch.add(bicub_res, output).clamp(0, 255)
output = output.view((3, 256, 256))
output = toimg(output)
Now, having these images as low resolution, high resolution and residuals (network output):
if I sum the low resolution image with the residual image as shown in the code, what I get is:
that seems a bit too dark. Now, given that the data structure are numpy arrays, I've tried to stretch back the values of the array to the range (0, 255) and then convert it back to an image. In this case, I get this:
which is a bit brighter than before, but still very dark. What am I doing wrong? How can I get my image back?
EDIT: I will answer my question: the problem was a constant factor per each layer that I forgot to add.
Nonetheless, I have another question to ask: after recovering the right images, I noticed some kind of noise on each image:
and looking at other images, like the baby, I noticed that it is a repetition of 9 times (on a 3x3 grid) of some kind of "watermark self image". This pattern is the same for every picture, no matter what I do or how I train the network.
Why do I see this artifacts?
So, I solved both my questions. For future reference:
The first question was a mistake in my code: when I train the network, I subtract a constant value per channel PER_CHANNEL_MEANS = np.array([0.47614917, 0.45001204, 0.40904046]). When it came to get the image back, I didn't add that value back and since the value were fixed per each channel, it resulted in a brightness shifting.
My second question was even harder, because the problem wasn't my code or my network, but numpy: apparently, reshaping an array from (3, 256, 256) to (256, 256, 3)` changes the data distribution, hence the shifting. To solve, I used:
output = torch.add(output, torch.from_numpy(PER_CHANNEL_MEANS).view((1, 3, 256, 256))).clamp(0, 255)
o = output.view((3, 256, 256))
o = o.data.numpy()
o = np.swapaxes(o, 0, 1)
o = np.swapaxes(o, 1, 2)
it's not an elegant way, but it does the job.
ADDENDUM: At this point, I had solved my two problems but I had another one, that can be noticed very easily in the last image of my post: some pixel shifted to completely wrong colors.
To turn an array a into an image, I used a.astype(np.uint8), without being aware that if a value v exceeds np.uint8's maximum value (255), the resulting value will be np.mod(v, 255). This caused the color shifting, which I solved following the answer to that question.
Please feel free to suggest a more elegant way for my solution to the second problem, I will provide to edit it.
I have the following code that reads an image with opencv and displays it:
import cv2, matplotlib.pyplot as plt
img = cv2.imread('imgs_soccer/soccer_10.jpg',cv2.IMREAD_COLOR)
img = cv2.resize(img, (128, 128))
plt.imshow(img)
plt.show()
I want to generate some random images by using keras so I define this generator:
image_gen = ImageDataGenerator(rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.01,
zoom_range=[0.9, 1.25],
horizontal_flip=True,
vertical_flip=False,
fill_mode='reflect',
data_format='channels_last',
brightness_range=[0.5, 1.5])
but, when I use it in this way:
image_gen.flow(img)
I get this error:
'Input data in `NumpyArrayIterator` should have rank 4. You passed an array with shape', (128, 128, 3))
And it seems obvious to me: RGB, an image, of course it is 3 dimension!
What am I missing here?
The documentation says that it wants a 4-dim array, but does not specify what should I put in the 4th dimension!
And how this 4-dim array should be made? I have, for now, (width, height, channel), this 4th dimension goes at the start or at the end?
I am also not very familiar with numpy: how can I alter the existing img array to add a 4th dimension?
Use np.expand_dims():
import numpy as np
img = np.expand_dims(img, 0)
print(img.shape) # (1, 128, 128, 3)
The first dimension specifies the number of images (in your case 1 image).
Alternatively, you can use numpy.newaxis or None for promoting your 3D array to 4D as in:
img = img[np.newaxis, ...]
# or use None
img = img[None, ...]
The first dimension is usually the batch_size. This gives you lot of flexibility when you want to fully utilize modern hardwares such as GPUs as long as your tensor fits in your GPU memory. For example, you can pass 64 images by stacking 64 images along the first dimension. In this case, your 4D array would be of shape (64, width, height, channels).