If I have a dataset of images which I have created into tiles what is the best way to combine the tile dimension with the batch dimension?
For example my input files are of shape (300,300,3) a typical RGB image with 300x300 pixels.
I do preprocessing and create a tile dataset which creates a new shape: (?,100,128,128,3)
So I have create 100 tiles of size 30x30 from the original image, and reshaped each tile to 128x128 pixels and then cached the dataset and created a batch with dimension ?.
Now I want to combine the tiles into the batch dimension and get a shape of: (?,128,128,3)
I've tried mapping the dataset to this function:
def reshape_image(image_batch):
return tf.reshape(image_batch, (-1,128,128,3))
But this doesn't seem to be working as it is causing the iterator to hang on this call:
image_test = next(iter(image_ds))
As I thought, the answer was fairly simple if you are familiar with the Tensorflow operations, hopefully this question wasn't too confusing and it helps someone out there.
#load/preprocess images from paths
image_ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)
#split images into tiles so (X,Y,C) -> (N,X,Y,C) where N is the number of tiles
image_ds = image_ds.map(split_image, num_parallel_calls=AUTOTUNE)
#resize tiled images from 30x30 to 128x128, implementation doesn't really matter
image_ds = image_ds.map(resize_image, num_parallel_calls=AUTOTUNE)
#finally the answer!! use 'flat_map', 'unstack', and 'from_tensor_slices'
#tiled_images is of shape (N,X,Y,C)
def flat_map_impl(tiled_images):
#You return a new Dataset
#Unstack by default creates a list of tensors based on the first dimension
#therefore tf.unstack(tiled_images) is a list of size N with (X,Y,C) shaped elements
#finally from_tensor_slices creates a new dataset where each element is of shape (X,Y,C)
return tf.data.Dataset.from_tensor_slices(tf.unstack(tiled_images))
#call flat_map_impl with flat_map on the dataset
image_ds = image_ds.flat_map(flat_map_impl)
Related
I'm training a Yolo model by using cv2.dnn and blobFromImage. I have a df with all the images paths, which i iterate over, to obtain the features, through blobFromImage. So far, I have this:
for i in df.iloc:
img = cv2.imread(str(i[8]))
height, width, shape = img.shape
blob = cv2.dnn.blobFromImage(img, 1/255, (416,416), (0,0,0), True, crop = False) # extract features. Normalize and resize. Swap RGB colours
print(blob.shape)
net = cv2.dnn.readNet(path_cfg, path_weights)
layer_names = net.getLayerNames()
outputlayers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
net.setInput(blob)
outs = net.forward(outputlayers)
All my images are of shape (1024, 1024, 3). When I pass the df into the code, blob.shape is (1,3,416,416) in the majority of cases. However, for some images, it reshapes to other size, such as (1,3,814,450). The interesting thing is that if I create a df1 with that specific image path and pass it into the loop, the shape of the blob turns out correctly to (1,3,416,416). Therefore, I'm assuming that it takes some values from the previously passed images.
I would highly appreciate any help which would explain why this is happening and how to solve, so that all blobs are of shape (1,3,416,416).
Many thanks in advance
I expect all blobs to have (1,3,416,416) shape. Some turn out to be different, although all original images are of the same shape.
I tried to make a algorithm using Teachable Machine to receive a picture and see if it fall under one of two categories of pictures (e.g dogs or humans), but after I exported the code that was given I couldn't make sense of how I could make the results that were given via array to turn into something that anyone can understand. So far it only shows a list of two numbers (e.g [[0.00058185 0.99941814]] the first number being dogs and the second one humans) I wanted to make it to show which one of the two numbers means dog and human and the percentage of both or to make it to only shows which one is the most probable to be.
Here's the code:
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
from decimal import Decimal
# Disable scientific notation for clarity
np.set_printoptions(suppress=True)
# Load the model
model = tensorflow.keras.models.load_model('keras_model.h5')
# Create the array of the right shape to feed into the keras model
# The 'length' or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open('test_photo.jpg')
#resize the image to a 224x224 with the same strategy as in TM2:
#resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
#turn the image into a numpy array
image_array = np.asarray(image)
# display the resized image
image.show()
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
print(prediction)
input('Press ENTER to exit')
Using argmax and max does what you want:
"Prediction is {} with {}% probability".format(["dog", "human"][np.argmax(prediction)], round(np.max(prediction)*100,2))
'Prediction is human with 99.94% probability'
I would like to train 2 D images with the corresponding pixel heigh topography information. I have a bunch of 2 D images taken from a topography where the height of each pixel is also known. Is there any way that I can use deep learning to train the images with height pixel information?
I have already tried to infer some features from the images and pixel heights and relate them by regression method such as SVM, but I did not get satisfactory results yet for predicting new image pixel height features.
How about using the pixel height values as labels, and the images (RGB I assume, so 3 channels) as training set. Then you can just run supervised learning. Although I am not sure how you could recover height by just looking at an image, even humans would have trouble doing that even after seeing many images. I think you would need some kind of reference point.
To convert an image into a 3D array of values (3rd dimension are the color channels):
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 120))
# convert PIL.Image.Image type to 3D tensor with shape (120, 120, 3)
x = image.img_to_array(img)
There are a number of other ways too: Convert an image to 2D array in python
In terms of assigning labels to images (here labels are the pixel heights), it would be as simple as creating your training set x_train (nb_images, 120, 120, 3) and labels y_train (nb_images, 120, 120, 1) and running supervised learning on these until for each image in x_train the model can predict each corresponding value in the height set y_train within a certain error.
I've been using datasets from sklearn. And I want to show image from 'MNIST original' using openCV.imshow
Here is part of my code
dataset = datasets.fetch_mldata('MNIST original')
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
list_hog_fd = []
deskewed_images = []
for img in features:
cv2.imshow("digit", img)
deskewed_images.append(deskew(img))
"digit" window appears but it is definitely not an digit image. How can I access real image from dataset?
Shape
MNIST image datasets generally are distributed and used as a 1D vector of 784 values.
However, in order to show it as image, you need to convert it to a 2D matrix with 28*28 values.
Simply using img = img.reshape(28,28) might work in your case.
I have an issue with expanding the size of the Sklearn digit dataset digits from 8*8 to 32*32 pixels.
My approach is to take the 8*8 array and then flatten and expand it. That is, enlarge from 64 to 1024 pixels in total. Therefore I simply want to multiply the values along each row 16 times:
create a new array (=newfeatures) with 1024 NaN values.
Replace every 16. value of the newfeatures array with the values of the original array, that is (0=0),(16=1),(32=2),(...),(1008=64).
3.Replace the remaining NaN values with dropna(ffill) to "expand" the original image to a 32*32 pixels image.
Therefore I use the following code:
#Load in the training dataset
digits=datasets.load_digits()
features=digits.data
targets=digits.target
#Plot original digit
ax[0].imshow(features[0].reshape((8,8)))
#Expand 8*8 image to a 32*32 image (64 to 1024)
newfeatures=np.ndarray((1797,16*len(features[0])))
newfeatures[:]=np.NaN
newfeatures=pd.DataFrame(newfeatures)
for row in range(1797):
for i in range(0,63):
newfeatures.iloc[row,16*i]=features[row][i]
newfeatures.fillna(method="ffill",axis=1,inplace=True)
#Plot expanded image with 32*32 pixels
ax[1].imshow(newfeatures.values[0].reshape((32,32)))
As you can see, the result is not as expected
you can use skimage's resize as shown below
from skimage import transform
new_features = np.array(list
(map
(lambda img: transform.resize(
img.reshape(8,8),#old shape
(32, 32), #new shape
mode='constant',
#flatten the resized image
preserve_range=True).ravel(),
features)))
new_features shape will be (1797, 1024) and displaying the first image will show
Based on the above solution I think the following is a little bit more neater way:
from skimage import transform
newfeatures=[transform.resize(features[i].reshape(8,8),(32,32))for i in
range(len(features))]
plt.imshow(newfeatures[0].reshape((32,32)))