I would like to train 2 D images with the corresponding pixel heigh topography information. I have a bunch of 2 D images taken from a topography where the height of each pixel is also known. Is there any way that I can use deep learning to train the images with height pixel information?
I have already tried to infer some features from the images and pixel heights and relate them by regression method such as SVM, but I did not get satisfactory results yet for predicting new image pixel height features.
How about using the pixel height values as labels, and the images (RGB I assume, so 3 channels) as training set. Then you can just run supervised learning. Although I am not sure how you could recover height by just looking at an image, even humans would have trouble doing that even after seeing many images. I think you would need some kind of reference point.
To convert an image into a 3D array of values (3rd dimension are the color channels):
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 120))
# convert PIL.Image.Image type to 3D tensor with shape (120, 120, 3)
x = image.img_to_array(img)
There are a number of other ways too: Convert an image to 2D array in python
In terms of assigning labels to images (here labels are the pixel heights), it would be as simple as creating your training set x_train (nb_images, 120, 120, 3) and labels y_train (nb_images, 120, 120, 1) and running supervised learning on these until for each image in x_train the model can predict each corresponding value in the height set y_train within a certain error.
Related
I've got some tasks about classification and Object ROI.
So I got images and labels like class and x1,y2,x2,y2 (standard box)
But images are different in sizes, is there some solution to get box coordinates after resizing?
So what i mean - i got image 300 px H and 400 px W and box coordinates (x1,y1,x2,y2). Before train my Dl model - i have to resize all images to the same W and H, for example I choose 200*200, so is there some solution to calculate new box coordinates x1new_after_resizing, y1new_after_resizing, x2new_after_resizing,y2new_after_resizing?
And are there some tips about what H and H to choose for resizing? Mean of all images? Median?
Thanks!
If you want to get new coordinates from image size of orig_width and orig_height to new_width and new_height, you can use scale the box coordinates in the following way
width_scaled = new_width/orig_width
height_scaled = new_height/orig_height
x1_new = x1*width_scaled
y1_new = y1*height_scaled
x2_new = x2*width_scaled
y2_new = y2*height_scaled
You can plot these coordinates on the new image and check if you would like
There is no fixed method on how to choose the dimension of resizing images. It depends on various factors like the network, the GPU memory you have, batch size, and the shape of the smallest/largest image in the dataset. Ideally, it should not be too small/stretched out such that the images are incomprehensible or extremely stretched out
You can refer to this post to get an idea of image resizing
I tried to make a algorithm using Teachable Machine to receive a picture and see if it fall under one of two categories of pictures (e.g dogs or humans), but after I exported the code that was given I couldn't make sense of how I could make the results that were given via array to turn into something that anyone can understand. So far it only shows a list of two numbers (e.g [[0.00058185 0.99941814]] the first number being dogs and the second one humans) I wanted to make it to show which one of the two numbers means dog and human and the percentage of both or to make it to only shows which one is the most probable to be.
Here's the code:
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
from decimal import Decimal
# Disable scientific notation for clarity
np.set_printoptions(suppress=True)
# Load the model
model = tensorflow.keras.models.load_model('keras_model.h5')
# Create the array of the right shape to feed into the keras model
# The 'length' or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open('test_photo.jpg')
#resize the image to a 224x224 with the same strategy as in TM2:
#resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
#turn the image into a numpy array
image_array = np.asarray(image)
# display the resized image
image.show()
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
print(prediction)
input('Press ENTER to exit')
Using argmax and max does what you want:
"Prediction is {} with {}% probability".format(["dog", "human"][np.argmax(prediction)], round(np.max(prediction)*100,2))
'Prediction is human with 99.94% probability'
I am quite new to deep learning and image segmentation tasks.
I want to train a 2D unet on 3D nifty data (CT scans) by taking the center 50 slices of every case. I saved the images and labels as png, but the labels are completely black. My goal is to predict the tumor region (Y), given a CT scan slice (X).
What am I doing wrong?
My code:
labels = []
for i in range (0,100):
seg = Path("/content/drive/My Drive/")/"case_{:05d}".format(i) / "segmentation.nii.gz"
seg = nib.load(seg)
seg = seg.get_data()
n_i, n_j, n_k = seg.shape
seg = seg[int(((n_i-1)/2)-25):int(((n_i-1)/2)+25),:,:]
for i in range(seg.shape[0]):
labels.append((seg[i,:,:]))
i+=1
labels = np.array(labels)
for i in range(labels.shape[0]):
label = (labels[i,:,:])
imsave('/content/drive/My Drive/labels/labels_slice_{:05d}.png'.format(i), label)
i += 1
I do the same for the images ad get the following .png file:
image_slice_00680
But for the labels I only get a completely black image.
Data type of image and label is float64 and uint16 respectively.
Example segmentation nifti file
I checked your nifti file in both 3D Slicer and nipy and the file is perfectly fine. Out of 608 slices in this segmentation file, only slices 181-397 have positive values, so you are supposed to get completely black images for the rest.
This short snippet allows me to save the positive example at the 300th slice:
import nibabel
import matplotlib.pyplot as plt
seg = nibabel.load("D:/Downloads/segmentation.nii.gz")
data = seg.get_fdata()
layer = data[300,:,:]
plt.imsave("D:/Downloads/seg.png", layer, cmap='gray')
Let me know if you can replicate this using the code above?
Also, I know it was not a part of the question, but you should strongly consider staying with nifti (or NRRD) format instead of converting them to PNG files.
1) When you are saving to PNG, you lose a lot of information from the CT scan. Basically you're rescaling CT values that often range from -2000 to +2000 to 0-255 pixel range.
2) Similarly with segmentation masks, in your nifti files segmented region is saved as "1" and background as "0". When you save it to PNG, you will get it rescaled to 0-255 and you will have to convert it back again for network training.
I'm trying to create training matrix for CNN.
Images are both RGB and Grey/scale.
To create something like
[ # of images, #features ]
Image Size is :
1024* 1024
Following is my code:
from skimage.transform import rescale, resize
from skimage import io
features = np.empty((0,1024 * 1024), np.float32)
imagePath = directoyPath+"/"+ imageName
image = io.imread(imagePath)
print(image.shape)
flatFeatures = np.reshape(image,(1,1024*1024))
print(flatFeatures.shape)
features = np.append(features, flatFeatures, axis=0)
print(features.shape)
The problem is RGB shape is (1024,1024,3).
How i can feed the RGB and grey scale images to features matrix.
simply you would have to feed in the RGB images after converting it into greyscale, you cannot pass images of different channels into a CNN, since RGB have 3 channels and grey scale images have 1 channel, specifying channels in the input layer of a CNN are necessary, it cannot be dynamic, so you have to make sure that you either have 3 channels or 1
for your purposes i would suggest you convert your grayscale images to RGB using cvtColor(gray, color, cv::COLOR_GRAY2BGR) the image wouldn't actually gain any color, but the number of channels in it would be 3, allowing you to pass both RGB and grayscale(technically RGB but still colorless) images together
I've been using datasets from sklearn. And I want to show image from 'MNIST original' using openCV.imshow
Here is part of my code
dataset = datasets.fetch_mldata('MNIST original')
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
list_hog_fd = []
deskewed_images = []
for img in features:
cv2.imshow("digit", img)
deskewed_images.append(deskew(img))
"digit" window appears but it is definitely not an digit image. How can I access real image from dataset?
Shape
MNIST image datasets generally are distributed and used as a 1D vector of 784 values.
However, in order to show it as image, you need to convert it to a 2D matrix with 28*28 values.
Simply using img = img.reshape(28,28) might work in your case.