image translation in Pytorch, using affine_grid & grid_sample functions - python

I am going to move the image for 1 or 2 pixels, as I specified a small number (1.25 , 1.9) in the affine matrix.
BUT, the image is moved far far away, like hundreds of pixels:
( my input image is fully filled with yellow pineapples)
Below is a working example.
import torch
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import torch.nn.functional as F
rotation_simple = np.array([[1,0, 1.25],
[ 0,1, 1.9]])
#load image
transform = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor()])
dataloader = torch.utils.data.DataLoader(datasets.ImageFolder('/home/Pictures',transform=transform,), shuffle=True)
dtype = torch.FloatTensor
i = 0
while i<3:
img, labels = next(iter(dataloader))
img = img#.double() # 有时候要转为double有时候不用转
rotation_simple = torch.as_tensor(rotation_simple)[None]
grid = F.affine_grid(rotation_simple, img.size()).type(dtype)
x = F.grid_sample(img, grid)
plt.imshow(x[0].permute(1, 2, 0))
plt.show()
i+=1
I wonder why does the function move the the image so far away instead of moving it for just 1 pixel in x and y direction.
Ps. Setting "align_corners=True" didn't help for this case.
Pps. My pytorch version is 1.4.0+cu100

The "unit of measures" for the grid and the affine transformation are not pixels, but rather normalized coordinates:
grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. For example, values x = -1, y = -1 is the left-top pixel of input, and values x = 1, y = 1 is the right-bottom pixel of input.
Therefore, translating by [1.25, 1.9] is actually translating by almost the entire image size. You need to divide the translation values by 2*img.shape to get pixel-wise translations.
See the doc for grid_sample for more information.

Related

not producing right output in mean_shift algorithm and performing segmentation for the image

I have a problem like I have to implement shift algorithm and perform the segmentation for the image. here is vegetable image
I have to use a suitable bandwidth such that the vegetables look as seprated as can. I used manually sklearn estimate_bandwidth to calculate bandwidth and i hard coded. I am not allowed to use sklearn i just can use numpy,PIL or matplotlib to implement this.
here is what i tried
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
# Load the image
img = np.array(Image.open("peppers.jpg"))
# Convert the image to grayscale
gray_img = np.mean(img, axis=2)
# Flatten the image to a 2D array of pixel values
flat_img = gray_img.reshape((-1, 1))
# Define the distance metric
def euclidean_distance(x1, x2):
return np.sqrt(np.sum((x1 - x2) ** 2))
# Estimate the bandwidth parameter using the median of the pairwise distances
bandwidth = 0.24570638879032147
# Perform Mean Shift clustering
centroids = []
for i, point in enumerate(flat_img):
centroid = point
converged = False
while not converged:
points_within_bandwidth = flat_img[euclidean_distance(flat_img, centroid) < bandwidth]
new_centroid = np.mean(points_within_bandwidth, axis=0)
if euclidean_distance(new_centroid, centroid) < 1e-5:
converged = True
centroid = new_centroid
centroids.append(centroid)
# Assign each data point to a cluster based on its converged mean
labels = np.zeros_like(flat_img)
for i, centroid in enumerate(centroids):
labels[euclidean_distance(flat_img, centroid) < bandwidth] = i
# Reshape the labels to the shape of the original image
segmented_img = labels.reshape(gray_img.shape)
# Display the segmented image
plt.imshow(segmented_img)
plt.show()
First it took a long time and does not showed the right output.

determining the average colour of a given circular sample of an image?

What I am trying to achieve is similar to photoshop/gimp's eyedropper tool: take a round sample of a given area in an image and return the average colour of that circular sample.
The simplest method I have found is to take a 'regular' square sample, mask it as a circle, then reduce it to 1 pixel, but this is very CPU-demanding (especially when repeated millions of times).
A more mathematically complex method is to take a square area and average only the pixels that fall within a circular area within that sample, but determining what pixel is or isn't within that circle, repeated, is CPU-demanding as well.
Is there a more succinct, less-CPU-demanding means to achieve this?
Here's a little example of skimage.draw.circle() which doesn't actually draw a circle but gives you the coordinates of points within a circle which you can use to index Numpy arrays with.
#!/usr/bin/env python3
import numpy as np
from skimage.io import imsave
from skimage.draw import circle
# Make rectangular canvas of mid-grey
w, h = 200, 100
img = np.full((h, w), 128, dtype=np.uint8)
# Get coordinates of points within a central circle
Ycoords, Xcoords = circle(h//2, w//2, 45)
# Make all points in circle=200, i.e. fill circle with 200
img[Ycoords, Xcoords] = 200
# Get mean of points in circle
print(img[Ycoords, Xcoords].mean()) # prints 200.0
# DEBUG: Save image for checking
imsave('result.png',img)
I'm sure that there's a more succinct way to go about it, but:
import math
import numpy as np
import imageio as ioimg # as scipy's i/o function is now depreciated
from skimage.draw import circle
import matplotlib.pyplot as plt
# base sample dimensions (rest below calculated on this).
# Must be an odd number.
wh = 49
# tmp - this placement will be programmed later
dp = 500
#load work image (from same work directory)
img = ioimg.imread('830.jpg')
# convert to numpy array (droppying the alpha while we're at it)
np_img = np.array(img)[:,:,:3]
# take sample of resulting array
sample = np_img[dp:wh+dp, dp:wh+dp]
#==============
# set up numpy circle mask
## this mask will be multiplied against each RGB layer in extracted sample area
# set up basic square array
sample_mask = np.zeros((wh, wh), dtype=np.uint8)
# set up circle centre coords and radius values
xy, r = math.floor(wh/2), math.ceil(wh/2)
# use these values to populate circle area with ones
rr, cc = circle(xy, xy, r)
sample_mask[rr, cc] = 1
# add axis to make array multiplication possible (do I have to do this)
sample_mask = sample_mask[:, :, np.newaxis]
result = sample * sample_mask
# count number of nonzero values (this will be our median divisor)
nz = np.count_nonzero(sample_mask)
sample_color = []
for c in range(result.shape[2]):
sample_color.append(int(round(np.sum(result[:,:,c])/nz)))
print(sample_color) # will return array like [225, 205, 170]
plt.imshow(result, interpolation='nearest')
plt.show()
Perhaps asking this question here wasn't necessary (it has been a while since I've python-ed, and was hoping that some new library had been developed for this since), but I hope this can be a reference for others who have the same goal.
This operation will be performed for every pixel in the image (sometimes millions of times) for thousands of images (scanned pages), so therein are my performance issue worries, but thanks to numpy, this code is pretty quick.

label2rgb implementation for OpenCV

Does OpenCV have function that can visualise a Mat of labels? Ie, similar o matlabs label2rgb().
The closest I can find is: cv2.applyColorMap(cv2.equalizeHist(segments), cv2.COLORMAP_JET)
However this is not a desired method when doing segmentation of video where the number of labels changes from one frame to the next. The reason being; one frame will have 2 labels (0 and 1 - representing sky and ground) so using jet it might show those 2 segments as dark blue and red respectively. The next frame has 3 labels (0,1,2 - sky, ground and car), so the ground segment has now change colour from red to yellow. So when you visualise this the same segments keeps changing colour and not remaining a consistent colour (red).
Therefore a function like matlabs label2rbg() would be really useful if it exists?
I like to use cv2.LUT for when there are less than 256 labels (since it only works with uint8). If you have more than 256 labels you can always convert to 256 values using (labels % 256).astype(np.uint8).
Then with your labels you simply call: rgb = cv2.LUT(labels, lut).
The only remaining problem is to create a lookup-table (lut) for your labels. You can use matplotlib colormaps as follows:
import numpy as np
import matplotlib.pyplot as plt
import cv2
def label2rgb(labels):
"""
Convert a labels image to an rgb image using a matplotlib colormap
"""
label_range = np.linspace(0, 1, 256)
lut = np.uint8(plt.cm.viridis(label_range)[:,2::-1]*256).reshape(256, 1, 3) # replace viridis with a matplotlib colormap of your choice
return cv2.LUT(cv2.merge((labels, labels, labels)), lut)
For many cases it is better to have the colors of adjacent labels be wildly different. Rick Szelski gives a pseudo code to achieve this in his book, appendix C2: Pseudocolor Generation. I've worked with his algorithm and variants of it in the past, it is fairly straightforward to code something up. Here is an sample code using his algorithm:
import numpy as np
import cv2
def gen_lut():
"""
Generate a label colormap compatible with opencv lookup table, based on
Rick Szelski algorithm in `Computer Vision: Algorithms and Applications`,
appendix C2 `Pseudocolor Generation`.
:Returns:
color_lut : opencv compatible color lookup table
"""
tobits = lambda x, o: np.array(list(np.binary_repr(x, 24)[o::-3]), np.uint8)
arr = np.arange(256)
r = np.concatenate([np.packbits(tobits(x, -3)) for x in arr])
g = np.concatenate([np.packbits(tobits(x, -2)) for x in arr])
b = np.concatenate([np.packbits(tobits(x, -1)) for x in arr])
return np.concatenate([[[b]], [[g]], [[r]]]).T
def labels2rgb(labels, lut):
"""
Convert a label image to an rgb image using a lookup table
:Parameters:
labels : an image of type np.uint8 2D array
lut : a lookup table of shape (256, 3) and type np.uint8
:Returns:
colorized_labels : a colorized label image
"""
return cv2.LUT(cv2.merge((labels, labels, labels)), lut)
if __name__ == '__main__':
labels = np.arange(256).astype(np.uint8)[np.newaxis, :]
lut = gen_lut()
rgb = labels2rgb(labels, lut)
And here is the colormap:

How to represent a binary image as a graph with the axis being height and width dimensions and the data being the pixels

I am trying to use Python along with opencv, numpy and matplotlib to do some computer vision for a robot which will use a railing to navigate. I am currently extremely stuck have run out of places to look. My current code is:
import cv2
import numpy as np
import matplotlib.pyplot as plt
image = cv2.imread('railings.jpg')
railing_image = np.copy(image)
resized_image = cv2.resize(railing_image,(881,565))
gray = cv2.cvtColor(resized_image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
canny = cv2.Canny(blur, 85, 255)
cv2.imshow('test',canny)
image_array = np.array(canny)
ncols, nrows = image_array.shape
count = 0
scan = np.array
for x in range(0,image_array.shape[1]):
for y in range(0,image_array.shape[0]):
if image_array[y, x] == 0:
count += 1
scan = [scan, count]
print(scan)
plt.plot([0, count])
plt.axis([0, nrows, 0, ncols])
plt.show()
cv2.waitKey(0)
I am using a canny image which is stored in an array of 1's and 0's, the image I need represented is
The final result should look something like the following image.
I've tried using a histogram function but I've only managed to get that to output essentially a count of the number of times a 1 or 0 appears.
If anyone could help me or point me in the right direction that would produce a graph that represents the image pixels within a graph of height and width dimensions.
Thank you
I'm not sure how general this is but you could just use numpy argmax to get location of the maximum (like this) in your case. You should avoid loops as this will be very slow, better to use numpy functions. I've imported your image and used the cutoff criterion that 200 or more in the yellow channel is railing,
import cv2
import numpy as np
import matplotlib.pyplot as plt
#This loads the canny image you uploaded
image = cv2.imread('uojHJ.jpg')
#Trim off the top taskbar
trimimage = image[100:, :,0]
#Use argmax with 200 cutoff colour in one channel
maxindex = np.argmax(trimimage[:,:]>200, axis=0)
#Plot graph
plt.plot(trimimage.shape[0] - maxindex)
plt.show()
Where this looks as follows:

Does SciPy have any function for extracting data from image?

I have to analyze the data given as an image like:
What I do is
Earasing the axises manually.
Convert the image to (x,y) coordinates by imagemagick (collecting the coordinates of black pixels)
Adjusting the (x,y) values (according to the axis values (rather than the pixel coordinates), then y direction: in images, the y coordinate increases from top to bottom).
Sorting the data by x.
Loading the data in a SciPy script.
I wonder if there is a function to do any of the steps 1-4 in the same SciPy script.
Since SciPy has a range of functions for image recognition, I would like to know if there is a function to translate an image into the (x,y) coordinates of the black pixels creating the curve, and so on.
First, load the image with sp.misc.imread or PIL so that it resides in a numpy array that I'll refer to as img below. (If it is a color image convert it to grayscale with img = np.mean(img, axis=-1). Then:
img = img[:-k, :] # Erase axes manually, k is the height of the axis in pixels
y, x = np.where(img == 0) # Find x and y coordinates of all black pixels
y -= y.max() # invert y axis so (0,0) is in the bottom left corner
i = np.argsort(x); x, y = x[i], y[i] # Sort data by x
Assumed imports:
import numpy as np
import scipy as sp

Categories

Resources