I tried to make a algorithm using Teachable Machine to receive a picture and see if it fall under one of two categories of pictures (e.g dogs or humans), but after I exported the code that was given I couldn't make sense of how I could make the results that were given via array to turn into something that anyone can understand. So far it only shows a list of two numbers (e.g [[0.00058185 0.99941814]] the first number being dogs and the second one humans) I wanted to make it to show which one of the two numbers means dog and human and the percentage of both or to make it to only shows which one is the most probable to be.
Here's the code:
import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
from decimal import Decimal
# Disable scientific notation for clarity
# Load the model
model = tensorflow.keras.models.load_model('keras_model.h5')
# Create the array of the right shape to feed into the keras model
# The 'length' or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open('test_photo.jpg')
#resize the image to a 224x224 with the same strategy as in TM2:
#resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
#turn the image into a numpy array
image_array = np.asarray(image)
# display the resized image
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
input('Press ENTER to exit')
Using argmax and max does what you want:
"Prediction is {} with {}% probability".format(["dog", "human"][np.argmax(prediction)], round(np.max(prediction)*100,2))
'Prediction is human with 99.94% probability'
After running yolov8, the algorithm annotated the following picture: Density-Area
My goal is to crop out a large number of these pictures to use in the further analysis. So, I want everything within the bounding box saved, and everything else outside of it removed.
I tried using torch, numpy, cv2, and PIL but haven't been successful.
import torch
import torchvision
from PIL import Image
# Load the image
image = Image.open("path to .jpg")
# Define the model and download the pre-trained weights
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, weights=None)
# Set the model to evaluation mode
# Transform the image to a tensor
transform = torchvision.transforms.ToTensor()
image_tensor = transform(image)
# Make predictions on the image using the model
predictions = model([image_tensor])
# Extract the bounding boxes and object labels from the predictions
boxes = predictions[0]['boxes'].tolist()
labels = predictions[0]['labels'].tolist()
# Crop the image for each object detected
for i in range(len(boxes)):
bbox = tuple(boxes[i])
object_label = labels[i]
object_image = image.crop(bbox)
The image is just an nd-array, so just use array indexing to perform the cropping operation you desire.
For example I assume your bounding boxes are of the form [xmin,ymin,xmax,ymax].
for i in range(len(boxes)):
object_label = labels[i]
object_image = image_tensor
crop = object_image[:,ymin:ymax,xmin:xmax]
# permute color dimension last
crop = crop.permute(1,2,0)
# convert from tensor to numpy array
crop = crop.data.numpy()
# swap from RGB to BGR (per opencv convention)
crop = crop[:,:,::-1]
# save
I'm sure you could accomplish this working directly with the PIL image objects as well but more generally in response to your comment: NO, you cannot crop an image without providing the coordinates of the cropping bounding box.
Today I was trying to compress the image below, using sklearn's PCA algorithm in Python.
Because the image is RGB (3 channels), I first reshaped the image, so that it becomes 2D. Then, I applied the PCA algorithm on the data to compress the image. After the image was compressed, I inversed the PCA transformation and reshaped the approximated (decompressed) image back to its original shape.
However, when I tried to display the approximated image I got this weird result here:
While the image is stored correctly with the cv2.imwrite function, OpenCV fails to display the image correctly using cv2.imshow. Do You have any idea why this might be happening?
My code is below:
from sklearn.decomposition import PCA
import cv2
import numpy as np
image_filepath = 'baby_yoda_image.jpg'
# Loading image from disk.
input_image = cv2.imread(image_filepath)
height = input_image.shape[0]
width = input_image.shape[1]
channels = input_image.shape[2]
# Reshaping image to perform PCA.
print('Input image shape:', input_image.shape)
#--- OUT: (533, 800, 3)
reshaped_image = np.reshape(input_image, (height, width*channels))
print('Reshaped Image:', reshaped_image.shape)
#--- OUT: (533, 2400)
# Applying PCA transformation to image. No whitening is applied to prevent further data loss.
n_components = 64
whitening = False
pca = PCA(n_components, whitening)
compressed_image = pca.fit_transform(reshaped_image)
print('PCA Compressed Image Shape:', compressed_image.shape)
#--- OUT: (533, 64)
print('Compression achieved:', np.around(np.sum(pca.explained_variance_ratio_), 2)*100, '%')
#--- OUT: 97.0 %
# Plotting images.
approximated_image = pca.inverse_transform(compressed_image)
approximated_original_shape_image = np.reshape(approximated_image, (height, width, channels))
cv2.imshow('Input Image', input_image)
cv2.imshow('Compressed Image', approximated_original_shape_image)
Thanks in advance.
Finally, I found a solution to this problem, thanks to #fmw42 . After the transformation, there were negative values in the pixels and also values that exceeded 255.
Luckily, OpenCV does take care of this problem with this line of code:
approximated_uint8_image = cv2.convertScaleAbs(approximated_original_shape_image)
I'm attempting to train a Unet to provide each pixel of a 256x256 image with a label, similar to the tutorial given here. In the example, the predictions of the Unet are a (128x128x3) output where the 3 denotes one of the classifications assigned to each pixel. In my case, I need a (256x256x10) output having 10 different classifications (Essentially a one-hot encoded array for each pixel in the image).
I can load the images but I'm struggling to convert each image's corresponding segmentation mask to the correct format. I have created DataSets by defining a map function called process_path which takes a saved numpy representation of the mask and creates a tensor of dimension (256 256 10), but I get a ValueError when I call model.fit, telling me that it cannot call as_list because the shape of the Tensor cannot be found:
# --------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------
def decode_npy(npy):
filename = npy.numpy()
data = np.load(filename)
data = kerasUtils.to_categorical(data, 10)
return data
# --------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------
def decode_img(img):
img = tf.image.decode_png(img, channels=3)
return tf.image.convert_image_dtype(img, tf.float32)
# --------------------------------------------------------------------------------------
# input - path to an image file
# output - an input image and output mask
# --------------------------------------------------------------------------------------
def process_path(filePath):
parts = tf.strings.split(filePath, '/')
fileName = parts[-1]
parts = tf.strings.split(fileName, '.')
prefix = tf.convert_to_tensor(maskDir, dtype=tf.string)
suffix = tf.convert_to_tensor("-mask.png", dtype=tf.string)
maskFileName = tf.strings.join((parts[-2], suffix))
maskPath = tf.strings.join((prefix, maskFileName), separator='/')
# load the raw data from the file as a string
img = tf.io.read_file(filePath)
img = decode_img(img)
mask = tf.py_function(decode_npy, [maskPath], tf.float32)
return img, mask
trainDataSet = allDataSet.take(trainSize)
trainDataSet = trainDataSet.map(process_path).batch(4)
validDataSet = allDataSet.skip(trainSize)
validDataSet = validDataSet.map(process_path).batch(4)
How can I take each images' corresponding (256 256 3) segmentation mask (stored as png) and convert it to a (256 256 10) tensor, where the i-th channel represents the pixels value as in the tutorial? Can anyone explain how this is achieved, either in the process_path function or wherever it would be most efficient to perform the conversion?
Here is an example of a segmentation mask. Every mask contains the same 10 colours shown:
import numpy as np
from cv2 import imread
im = imread('hfoa7.png', 0) # read as grayscale to get 10 unique values
n_classes = 10
one_hot = np.zeros((im.shape[0], im.shape[1], n_classes))
for i, unique_value in enumerate(np.unique(im)):
one_hot[:, :, i][im == unique_value] = 1
hfao7 is the name of the image you posted. This code snippet creates a one-hot matrix from the image.
You will want to insert this code into decode_npy(). However, since you sent me a png, the code above won't work with a npy file. You could pass in the names of the pngs instead of the npys instead. Don't worry about using kerasUtils.to_categorical - the function I posted makes categorical labels.
You can do this in pure Tensorflow, see my Blogpost: https://www.spacefish.biz/2020/11/rgb-segmentation-masks-to-classes-in-tensorflow/
Working with a deep learning project and I have a lot of images, that don't need to have colors. I saved them doing:
import matplotlib.pyplot as plt
plt.imsave('image.png', image, format='png', cmap='gray')
However later when I checked the shape of the image the result is:
import cv2
img_rgb = cv2.imread('image.png')
So even though the image I view is in grayscale, I still have 3 color channels. I realized I had to do some algebric operations in order to convert those 3 channels into 1 single channel.
I have tried the methods described on the thread "How can I convert an RGB image into grayscale in Python?" but I'm confused.
For example, when to do the conversion using:
from skimage import color
from skimage import io
img_gray = color.rgb2gray(io.imread('image.png'))
plt.imsave('image_gray.png', img_gray, format='png')
However when I load the new image and check its shape:
img_gr = cv2.imread('image_gray.png')
I tried the other methods on that thread but the results are the same. My goal is to have images with a (196,256,1) shape, given how much less computationally intensive it will be for a Convolutional Neural Network.
Any help would be appreciated.
Your first code block:
import matplotlib.pyplot as plt
plt.imsave('image.png', image, format='png', cmap='gray')
This is saving the image as RGB, because cmap='gray' is ignored when supplying RGB data to imsave (see pyplot docs).
You can convert your data into grayscale by taking the average of the three bands, either using color.rgb2gray as you have, or I tend to use numpy:
import numpy as np
from matplotlib import pyplot as plt
import cv2
img_rgb = np.random.rand(196,256,3)
print('RGB image shape:', img_rgb.shape)
img_gray = np.mean(img_rgb, axis=2)
print('Grayscale image shape:', img_gray.shape)
RGB image shape: (196, 256, 3)
Grayscale image shape: (196, 256)
img_gray is now the correct shape, however if you save it using plt.imsave, it will still write three bands, with R == G == B for each pixel. This is because, I believe, a PNG file requires three (or four) bands. Warning: I am not sure about this: I expect to be corrected.
plt.imsave('image_gray.png', img_gray, format='png')
new_img = cv2.imread('image_gray.png')
print('Loaded image shape:', new_img.shape)
Loaded image shape: (196, 256, 3)
One way to avoid this is to save the images as numpy files, or indeed to save a batch of images as numpy files:
np.save('np_image.npy', img_gray)
new_np = np.load('np_image.npy')
print('new_np shape:', new_np.shape)
new_np shape: (196, 256)
The other thing you could do is save the grayscale png (using imsave) but then only read in the first band:
finalimg = cv2.imread('image_gray.png',0)
print('finalimg image shape:', finalimg.shape)
finalimg image shape: (196, 256)
As it turns out, Keras, the deep-learning library I'm using has its own method of converting images to a single color channel (grayscale) in its image pre-processing step.
When using the ImageDataGenerator class the flow_from_directory method takes the color_mode argument. Setting color_mode = "grayscale" will automatically convert the PNG into a single color channel!
Hope this helps someone in the future.
if you want to just add extra channels that have the same value as the graysacale , maybe to use a specific model that requires 3 channel input_shape .
lets say your pictures are 28 X 28 and so you have a shape of (28 , 28 , 1)
def add_extra_channels_to_pic(pic):
if pic.shape == (28 , 28 , 1):
pic = pic.reshape(28,28)
pic = np.array([pic , pic , pic])
# to make the channel axis in the end
pic = np.moveaxis(pic , 0 , -1)
return pic
Try this method
import imageio
new_data = imageio.imread("file_path", as_gray =True)
imageio.imsave("file_path", new_data)
The optional argument "as_gray = True" in line 2 of the code does the actual conversion.
I've been using datasets from sklearn. And I want to show image from 'MNIST original' using openCV.imshow
Here is part of my code
dataset = datasets.fetch_mldata('MNIST original')
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
list_hog_fd = []
deskewed_images = []
for img in features:
cv2.imshow("digit", img)
"digit" window appears but it is definitely not an digit image. How can I access real image from dataset?
MNIST image datasets generally are distributed and used as a 1D vector of 784 values.
However, in order to show it as image, you need to convert it to a 2D matrix with 28*28 values.
Simply using img = img.reshape(28,28) might work in your case.