Restart mean calculation of blobFromImage - python

I'm training a Yolo model by using cv2.dnn and blobFromImage. I have a df with all the images paths, which i iterate over, to obtain the features, through blobFromImage. So far, I have this:
for i in df.iloc:
img = cv2.imread(str(i[8]))
height, width, shape = img.shape
blob = cv2.dnn.blobFromImage(img, 1/255, (416,416), (0,0,0), True, crop = False) # extract features. Normalize and resize. Swap RGB colours
print(blob.shape)
net = cv2.dnn.readNet(path_cfg, path_weights)
layer_names = net.getLayerNames()
outputlayers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
net.setInput(blob)
outs = net.forward(outputlayers)
All my images are of shape (1024, 1024, 3). When I pass the df into the code, blob.shape is (1,3,416,416) in the majority of cases. However, for some images, it reshapes to other size, such as (1,3,814,450). The interesting thing is that if I create a df1 with that specific image path and pass it into the loop, the shape of the blob turns out correctly to (1,3,416,416). Therefore, I'm assuming that it takes some values from the previously passed images.
I would highly appreciate any help which would explain why this is happening and how to solve, so that all blobs are of shape (1,3,416,416).
Many thanks in advance
I expect all blobs to have (1,3,416,416) shape. Some turn out to be different, although all original images are of the same shape.

Related

How to correctly turn a numpy array into an image using residuals?

I'm working with GANs on the Single Image Super-Resolution (SISR) problem at 4x scaling. I am using residual learning techniques, so what I get back from the trained network is a tensor containing the estimated residual image between the upscaled input image and the target image. I feed the network with normalized numpy arrays representing the images (np.asarray(image) / 255).
In order to get the final estimate image, then, I have to sum the upscaled input image with the residual image. Here is the code I use (input image's size is 64x64 while the output has size 256x256):
net.eval()
img = Image.open(image_folder + 'lr/' + image_name)
tens = transforms.ToTensor()
toimg = transforms.ToPILImage()
input = tens(img)
bicub_res = tens(img.resize((img.size[0] * 4, img.size[1] * 4), Image.BICUBIC))
input = input.view((1, 3, 64, 64))
output = net(input)
output = torch.add(bicub_res, output).clamp(0, 255)
output = output.view((3, 256, 256))
output = toimg(output)
Now, having these images as low resolution, high resolution and residuals (network output):
if I sum the low resolution image with the residual image as shown in the code, what I get is:
that seems a bit too dark. Now, given that the data structure are numpy arrays, I've tried to stretch back the values of the array to the range (0, 255) and then convert it back to an image. In this case, I get this:
which is a bit brighter than before, but still very dark. What am I doing wrong? How can I get my image back?
EDIT: I will answer my question: the problem was a constant factor per each layer that I forgot to add.
Nonetheless, I have another question to ask: after recovering the right images, I noticed some kind of noise on each image:
and looking at other images, like the baby, I noticed that it is a repetition of 9 times (on a 3x3 grid) of some kind of "watermark self image". This pattern is the same for every picture, no matter what I do or how I train the network.
Why do I see this artifacts?
So, I solved both my questions. For future reference:
The first question was a mistake in my code: when I train the network, I subtract a constant value per channel PER_CHANNEL_MEANS = np.array([0.47614917, 0.45001204, 0.40904046]). When it came to get the image back, I didn't add that value back and since the value were fixed per each channel, it resulted in a brightness shifting.
My second question was even harder, because the problem wasn't my code or my network, but numpy: apparently, reshaping an array from (3, 256, 256) to (256, 256, 3)` changes the data distribution, hence the shifting. To solve, I used:
output = torch.add(output, torch.from_numpy(PER_CHANNEL_MEANS).view((1, 3, 256, 256))).clamp(0, 255)
o = output.view((3, 256, 256))
o = o.data.numpy()
o = np.swapaxes(o, 0, 1)
o = np.swapaxes(o, 1, 2)
it's not an elegant way, but it does the job.
ADDENDUM: At this point, I had solved my two problems but I had another one, that can be noticed very easily in the last image of my post: some pixel shifted to completely wrong colors.
To turn an array a into an image, I used a.astype(np.uint8), without being aware that if a value v exceeds np.uint8's maximum value (255), the resulting value will be np.mod(v, 255). This caused the color shifting, which I solved following the answer to that question.
Please feel free to suggest a more elegant way for my solution to the second problem, I will provide to edit it.

How does cv2.imencode works? Output different shapes of two images of the same shape

I have two images with same shape and using cv2.imencode I got two array with different shapes, why this? How can I get encoded images of the same shape?
print(img1.shape)
OUTPUT: (720, 1280, 3)
print(img2.shape)
OUTPUT: (720, 1280, 3)
img1_encoded = cv2.imencode('.png', img1)
img2_encoded = cv2.imencode('.png', img2)
print(img1_encoded)
OUTPUT: (927851, 1)
print(img2_encoded)
OUTPUT: (73513, 1)
The function imencode compresses an image and stores it in the memory buffer that is resized to fit the result.
img.shape returns the dimensions of the image and the number of channels in the image, in this case both your images have 3 channels indicating that they are colour images.
In laymen terms, image compression is dependant upon the frequency of a particular colour component within an image.
Given that you are encoding different images, they will always have a different output size.
http://www.libpng.org/pub/png/book/chapter09.html - Here is a link into how png compression works.

TensorFlow Dataset adding tiled images to batch dimension

If I have a dataset of images which I have created into tiles what is the best way to combine the tile dimension with the batch dimension?
For example my input files are of shape (300,300,3) a typical RGB image with 300x300 pixels.
I do preprocessing and create a tile dataset which creates a new shape: (?,100,128,128,3)
So I have create 100 tiles of size 30x30 from the original image, and reshaped each tile to 128x128 pixels and then cached the dataset and created a batch with dimension ?.
Now I want to combine the tiles into the batch dimension and get a shape of: (?,128,128,3)
I've tried mapping the dataset to this function:
def reshape_image(image_batch):
return tf.reshape(image_batch, (-1,128,128,3))
But this doesn't seem to be working as it is causing the iterator to hang on this call:
image_test = next(iter(image_ds))
As I thought, the answer was fairly simple if you are familiar with the Tensorflow operations, hopefully this question wasn't too confusing and it helps someone out there.
#load/preprocess images from paths
image_ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)
#split images into tiles so (X,Y,C) -> (N,X,Y,C) where N is the number of tiles
image_ds = image_ds.map(split_image, num_parallel_calls=AUTOTUNE)
#resize tiled images from 30x30 to 128x128, implementation doesn't really matter
image_ds = image_ds.map(resize_image, num_parallel_calls=AUTOTUNE)
#finally the answer!! use 'flat_map', 'unstack', and 'from_tensor_slices'
#tiled_images is of shape (N,X,Y,C)
def flat_map_impl(tiled_images):
#You return a new Dataset
#Unstack by default creates a list of tensors based on the first dimension
#therefore tf.unstack(tiled_images) is a list of size N with (X,Y,C) shaped elements
#finally from_tensor_slices creates a new dataset where each element is of shape (X,Y,C)
return tf.data.Dataset.from_tensor_slices(tf.unstack(tiled_images))
#call flat_map_impl with flat_map on the dataset
image_ds = image_ds.flat_map(flat_map_impl)

Topography height prediction from 2D image

I would like to train 2 D images with the corresponding pixel heigh topography information. I have a bunch of 2 D images taken from a topography where the height of each pixel is also known. Is there any way that I can use deep learning to train the images with height pixel information?
I have already tried to infer some features from the images and pixel heights and relate them by regression method such as SVM, but I did not get satisfactory results yet for predicting new image pixel height features.
How about using the pixel height values as labels, and the images (RGB I assume, so 3 channels) as training set. Then you can just run supervised learning. Although I am not sure how you could recover height by just looking at an image, even humans would have trouble doing that even after seeing many images. I think you would need some kind of reference point.
To convert an image into a 3D array of values (3rd dimension are the color channels):
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 120))
# convert PIL.Image.Image type to 3D tensor with shape (120, 120, 3)
x = image.img_to_array(img)
There are a number of other ways too: Convert an image to 2D array in python
In terms of assigning labels to images (here labels are the pixel heights), it would be as simple as creating your training set x_train (nb_images, 120, 120, 3) and labels y_train (nb_images, 120, 120, 1) and running supervised learning on these until for each image in x_train the model can predict each corresponding value in the height set y_train within a certain error.

Show image from fetched data using openCV

I've been using datasets from sklearn. And I want to show image from 'MNIST original' using openCV.imshow
Here is part of my code
dataset = datasets.fetch_mldata('MNIST original')
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
list_hog_fd = []
deskewed_images = []
for img in features:
cv2.imshow("digit", img)
deskewed_images.append(deskew(img))
"digit" window appears but it is definitely not an digit image. How can I access real image from dataset?
Shape
MNIST image datasets generally are distributed and used as a 1D vector of 784 values.
However, in order to show it as image, you need to convert it to a 2D matrix with 28*28 values.
Simply using img = img.reshape(28,28) might work in your case.

Categories

Resources