I have a set of images and their corresponding YOLO coordinates. Now I want to extract the objects that these YOLO coordinates denote into separate images.
But these coordinates are in floating point notation and hence am not able to use splicing.
This is an image Sample Image and the corresponding YOLO coordinates are
labels = [0.536328, 0.5, 0.349219, 0.611111]
I read my image as follows :
image = cv2.imread('frame0.jpg')
Then I wanted to use something like image[y:y+h,x:x+w] as I had seen in a similar question. But the variables are float, so I tried to convert them into integers using the dimensions of the image 1280 x 720 like this :
object = [int(label[0]*720), int(label[1]*720), int(label[2]*1280), int(label[3]*1280)]
x,y,w,h = object
But it doesn't get the part of the image correctly as you can see over here extractedImage
This is part of my training dataset, so I had cropped these parts earlier using some tools, so there would not be any errors in my labels. Also all the images are incorrectlly cropped this way, I have shown the output for 1 of the images.
Thanks a lot in advance. Any suggestions would be really helpful !
The labels need to be normalized differently - since the x and y are with respect to the center of the screen, they're actually multiplied by W/2 and H/2, respectively. Also, the width and height dimensions have to be multiplied by W and H, respectively - they're currently both being normalized by the W (1280). Here's how I solved it:
import cv2
import matplotlib.pyplot as plt
label = [0.536328, 0.5, 0.349219, 0.611111]
img = cv2.imread('P6A4J.jpg')
H, W, _ = img.shape
object = [int(label[0]*W/2), int(label[1]*H/2), int(label[2]*W), int(label[3]*H)]
x,y,w,h = object
plt.subplot(1,2,1)
plt.imshow(img)
plt.subplot(1,2,2)
plt.imshow(img[y:y+h, x:x+w])
plt.show()
plt.show()
Output:
]1
Hope this helps!
detect.py
Crops will be saved under runs/detect/exp/crops, with a directory for each class detected.
python detect.py --save-crop
https://github.com/ultralytics/yolov5/issues/5412
Related
I'm trying to generate a 3D images using stack of 2D grayscale images in python. I currently have the images, mask, and mask output. I tried creating an ndarray by adding an axis to my images but this didn't seems to work.
This is what I wrote:
# load images
images_gray = []
#x, y= images[0].shape
#z= len(frames)
#threeD= np.ndarray([x,y,z]) #3D
threeD=[]
for i in range(len(images)):
frame= cv2.imread(path+'/images/' + str(i))
#convert to grayscale then save
images_gray.append(rgb2gray(frame))
#create a polygon
coordinates=coord[i]
coordinates = [[y,x] for [x,y] in coordinates] #change order for polygon2mask
polygon = np.array(coordinates)
#create a mask
mask= polygon2mask(images_gray[i].shape, polygon)
#apply mask
result=ma.masked_array(images_gray[i], np.invert(mask))
temp=result[... ,np.newaxis]
threeD.append(temp)
The resulted output shape for threeD is (#of frames, image hight, image width, 1). I don't know where the 1 come from, and I also expected the order to be (x, y, z) = (image hight, image width, #of frames). The output is wrong and I wasn't able to view it using plt as I got type error saying invalid shape.
For the z, I thought about setting a value of 0.1 that would represent the thickness, not sure how to set that up.
I'm also not sure if my approach is correct or not; do I have to create a points clouds instead? mesh? any suggestions?
I've got some tasks about classification and Object ROI.
So I got images and labels like class and x1,y2,x2,y2 (standard box)
But images are different in sizes, is there some solution to get box coordinates after resizing?
So what i mean - i got image 300 px H and 400 px W and box coordinates (x1,y1,x2,y2). Before train my Dl model - i have to resize all images to the same W and H, for example I choose 200*200, so is there some solution to calculate new box coordinates x1new_after_resizing, y1new_after_resizing, x2new_after_resizing,y2new_after_resizing?
And are there some tips about what H and H to choose for resizing? Mean of all images? Median?
Thanks!
If you want to get new coordinates from image size of orig_width and orig_height to new_width and new_height, you can use scale the box coordinates in the following way
width_scaled = new_width/orig_width
height_scaled = new_height/orig_height
x1_new = x1*width_scaled
y1_new = y1*height_scaled
x2_new = x2*width_scaled
y2_new = y2*height_scaled
You can plot these coordinates on the new image and check if you would like
There is no fixed method on how to choose the dimension of resizing images. It depends on various factors like the network, the GPU memory you have, batch size, and the shape of the smallest/largest image in the dataset. Ideally, it should not be too small/stretched out such that the images are incomprehensible or extremely stretched out
You can refer to this post to get an idea of image resizing
I work at a studio that does school photos and we are trying to make a script to eliminate the job of cropping each photo to a template. The photos we work with are fairly uniform but they vary in resolution and head position a bit. I took up the mantle of trying to write the script with my fairly limited Python knowledge and through a lot of trial and error and online resources I think I have got most of the way there.
At the moment I am trying to figure out the best way to have the image crop from the NumPy array with the head where I want and I just cant find a good flexible solution. The head needs to be positioned slightly differently for pose 1 and pose 2 so its needs to be easy to change on the fly (Probably going to implement some sort of simple GUI to input stuff like that, but for now I can just change the code).
I also need to be able to change the output resolution of the photo so they are all uniform (2000x2500). Anyone have any ideas?
At the moment this is my current code, it just saves the detected face square:
import cv2
import os.path
import glob
# Cascade path
cascPath = 'haarcascade_frontalface_default.xml'
# Create the haar cascade
faceCascade = cv2.CascadeClassifier(cascPath)
#Check for output folder and create if its not there
if not os.path.exists('output'):
os.makedirs('output')
# Read Images
images = glob.glob('*.jpg')
for c, i in enumerate(images):
image = cv2.imread(i, 1)
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find face(s) using cascade
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1, # size of groups
minNeighbors=5, # How many groups around are detected as face for it to be valid
minSize=(500, 500) # Min size in pixels for face
)
# Outputs number of faces found in image
print('Found {0} faces!'.format(len(faces)))
# Places a rectangle on face
for (x, y, w, h) in faces:
imgCrop = image[y:y+h,x:x+w]
if len(faces) > 0:
#Saves Images to output folder with OG name
cv2.imwrite('output/'+ i, imgCrop)
I can crop using it like this:
# Crop Padding
left = 300
right = 300
top = 400
bottom = 1000
for (x, y, w, h) in faces:
imgCrop = image[y-top:y+h+bottom, x-left:x+w+right]
but that outputs pretty random resolutions and changes based on the image resolution
TL;DR
To set a new resolution with the dimension, you can use cv2.resize. There may be a pixel loss so you can use the interpolation method.
The newly resized image may be in BGR format, so you may need to convert to RGB format.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Suggestion:
One approach is using python's face-recognition library.
The approach is using two sample images for training.
Predict the next image based on training images.
For instance, The followings are the training images:
We want to predict the faces in the below image:
When we get the facial encodings of the training images and apply to the next image:
import face_recognition
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
# Load a sample picture and learn how to recognize it.
first_image = face_recognition.load_image_file("images/ex.jpg")
first_face_encoding = face_recognition.face_encodings(first_image)[0]
# Load a second sample picture and learn how to recognize it.
second_image = face_recognition.load_image_file("images/index.jpg")
sec_face_encoding = face_recognition.face_encodings(second_image)[0]
# Create arrays of known face encodings and their names
known_face_encodings = [
first_face_encoding,
sec_face_encoding
]
print('Learned encoding for', len(known_face_encodings), 'images.')
# Load an image with an unknown face
unknown_image = face_recognition.load_image_file("images/babes.jpg")
# Find all the faces and face encodings in the unknown image
face_locations = face_recognition.face_locations(unknown_image)
face_encodings = face_recognition.face_encodings(unknown_image, face_locations)
# Convert the image to a PIL-format image so that we can draw on top of it with the Pillow library
# See http://pillow.readthedocs.io/ for more about PIL/Pillow
pil_image = Image.fromarray(unknown_image)
# Create a Pillow ImageDraw Draw instance to draw with
draw = ImageDraw.Draw(pil_image)
# Loop through each face found in the unknown image
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
best_match_index = np.argmin(face_distances)
draw.rectangle(((left, top), (right, bottom)), outline=(0, 0, 255), width=5)
# Remove the drawing library from memory as per the Pillow docs
del draw
# Display the resulting image
plt.imshow(pil_image)
plt.show()
The output will be:
The above is my suggestion. When you create a new resolution with the current image, there will be a pixel loss. Therefore you need to use an interpolation method.
For instance: after finding the face locations, select the coordinates in the original image.
# Add after draw.rectangle function.
crop = unknown_image[top:bottom, left:right]
Set new resolution with the size 2000 x 2500 and interpolation with CV2.INTERN_LANCZOS4.
Possible Question: Why CV2.INTERN_LANCZOS4?
Of course, you can select whatever you like, but in this post CV2.INTERN_LANCZOS4 was suggested.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
Save the image
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Outputs are around 4.3 MB Therefore I can't display in here.
From the final result, we clearly see and identify faces. The library precisely finds the faces in the image.
Here what you can do:
Either you can use the training images of your own-set, or you can use the example above.
Apply the face-recognition function for each image, using the trained face-locations and save the results in the directory.
here is how I got it to crop how I wanted, this is added right below the "output number of faces" function
#Get the face postion and output values into variables, might not be needed but I did it
for (x, y, w, h) in faces:
xdis = x
ydis = y
w = w
h = h
#Get scale value by dividing wanted head hight by detected head hight
ws = 600/w
hs = 600/h
#scale image to get head to right size, uses bilinear interpolation by default
scale = cv2.resize(image,(0,0),fx=hs,fy=ws)
#calculate head postion for given values
sxdis = int(xdis*ws) #applying scale to x distance and turning it into a integer
sydis = int(ydis*hs) #applying scale to y distance and turning it into a integer
sycent = sydis+300 #adding half head hight to get center
ystart = sycent-700 #subtract where you want the head center to be in pixels, this is for the vertical
yend = ystart+2500 #Add whatever you want vertical resolution to be
xcent = sxdis+300 #adding half head hight to get center
xstart = xcent-1000 #subtract where you want the head center to be in pixels, this is for the horizontal
xend = xstart+2000 #add whatever you want the horizontal resolution to be
#Crop the image
cropped = scale[ystart:yend, xstart:xend]
Its a mess but it works exactly how I wanted it to work.
ended up going with openCV instead of switching to python-Recognition because of speed but I might switch over if I can get multithreading to work in python-recognition.
I'm reading DICOM gray image file as
gray = dicom.dcmread(file).pixel_array
There I've got (x,y) shape but I need RGB (x,y,3) shape
I'm trying to convert using CV
img = cv2.cvtColor(gray, cv2.COLOR_GRAY2RGB)
And for testing I'm writing it to file cv2.imwrite('dcm.png', img)
I've got extremely dark image on output which is wrong, what is correct way to convert pydicom image to RGB?
To answer your question, you need to provide a bit more info, and be a bit clearer.
First what are you trying to do? Are you trying to only get an (x,y,3) array in memory? or are you trying to convert the dicom file to a .png file? ...they are very different things.
Secondly, what modality is your dicom image?
It's likely (unless its ultrasound or perhaps nuc med) a 16 bit greyscale image, meaning the data is 16 bit, meaning your gray array above is 16 bit data.
So the first thing to understand is window levelling and how to display a 16-bit image in 8 bits. have a look here: http://www.upstate.edu/radiology/education/rsna/intro/display.php.
If it's a 16-bit image, if you want to view it as a greyscale image in rgb format, then you need to know what window level you're using or need, and adjust appropriately before saving.
Thirdly, like lenik mention above, you need to apply the dicom slope/intercept values to your pixel data prior to using.
If your problem is just making a new array with extra dimension for rgb (so sizes (r,c) to (r,c,3)), then it's easy
# orig is your read in dcmread 2D array:
r, c = orig.shape
new = np.empty((w, h, 3), dtype=orig.dtype)
new[:,:,2] = new[:,:,1] = new[:,:,0] = orig
# or with broadcasting
new[:,:,:] = orig[:,:, np.newaxis]
That will give you the 3rd dimension. BUT the values will still all be 16-bit, not 8 bit as needed if you want it to be RGB. (Assuming your image you read with dcmread is CT, MR or equivalent 16-bit dicom - then the dtype is likely uint16).
If you want it to be RGB, then you need to convert the values to 8-bit from 16-bit. For that you'll need to decide on a window/level and apply it to select the 8-bit values from the full 16-bit data range.
Likely your problem above - I've got extremely dark image on output which is wrong - is actually correct, but it's dark because the window/level cv is using by default makes it 'look' dark, or it's correct but you didn't apply the slope/intercept.
If what you want to do is convert the dicom to png (or jpg), then you should probably use PIL or matplotlib rather than cv. Both of those offer easy ways to save a 16 bit 2D array (which is what you 'gray' is in your code above), both which allow you to specify window and level when saving to png or jpg. CV is complete overkill (meaning much bigger/slower to load, and much higher learning curve).
Some psueudo code using matplotlib. The vmin/vmax values you need to adjust - the ones here would be approximately ok for a CT image.
import matplotlib.pyplot as plt
df = dcmread(file)
slope = float(df.RescaleSlope)
intercept = float(df.RescaleIntercept)
df_data = intercept + df.pixel_array * slope
# tell matplotlib to 'plot' the image, with 'gray' colormap and set the
# min/max values (ie 'black' and 'white') to correspond to
# values of -100 and 300 in your array
plt.imshow(df_data, cmap='gray', vmin=-100, vmax=300)
# save as a png file
plt.savefig('png-copy.png')
that will save a png version, but with axes drawn as well. To save as just an image, without axes and no whitespace, use this:
inches = (3,3)
dpi = 150
fig, ax = plt.subplots(figsize=inches, dpi=dpi)
fig.subplots_adjust(left=0, right=1, top=1, bottom=0, wspace=0, hspace=0)
ax.imshow(df_data, cmap='gray', vmin=-100, vmax=300)
fig.save('copy-without-whitespace.png')
The full tutorial on reading DICOM files is here: https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial
Basically, you have to extract parameters slope and interception from the DICOM file and do the math for every pixel: hu = pixel_value * slope + intercept -- all this explained in the tutorial with the code samples and pictures.
I am using the following code to calculate the frequency or the MFCC coefficients of a wavelet signal. When I have calculated my signals (frequency over time) in 2D numpy arrays I am trying to store it locally into a .png images. I am trying to do so with two different possible ways. Firstly, by using:
matplotlib.image.imsave("my_img.png", filter_banks)
That leads to:
and the second way using librosa tool:
import librosa.display
from matplotlib import cm
fig = plt.figure(figsize=(..., ...), dpi=1)
librosa.display.specshow(filter_banks.T, cmap=cm.jet)
plt.tight_layout()
plt.savefig("_plot_static_conv.png")
plt.show()
and the result is look like:
My issue is that I am having some white margin over the image which are not desired. How can I have the same size also in the second case and avoid the white margin over the image that I guess is caused by the plt.figure?
EDIT: I tried to use the answer from the following post but it did not solve my issue.
probably as a workaround, your white margin is 4 pixel,
could you save your second image with 8 more pixel in height and width.
then crop it using c2v
import cv2
img = cv2.imread("image.png")
crop_img = img[y:y+h, x:x+w]
cv2.imshow("cropped", crop_img)
cv2.waitKey(0)
as proposed in:
https://stackoverflow.com/a/15589825/4610938