What is the difference between mat and matND? - python

I am trying to extract data from a binary mask. All goes well but changing to python will cause the data to shift a few pixels. It is enough so I cannot find the center. However saving the image will oldly enough display the pixels at the correct location
Here is my code. I basically create a normal mat to use as output. However a matnd is outputed according to the docs
Am I extracting the data properly? If so tell me. I am trying to find the center given points along the center. I kidda dont want my data to be shifted.
import cv2.cv as cv
def main():
imgColor = cv.LoadImage(OPTICIMAGE, cv.CV_LOAD_IMAGE_COLOR)
center, radius = centerandradus(imgColor)
def centerandradus(cvImg, ColorLower=None,ColorUpper=None):
lowerBound = cv.Scalar(130, 0, 130);
upperBound = cv.Scalar(171, 80, 171);
size = cv.GetSize(cvImg)
output = cv.CreateMat(size[0],size[1],cv.CV_8UC1)
cv.InRangeS(cvImg, lowerBound, upperBound,output)
mask = np.asarray( output[:,:] )
x,y = np.nonzero(mask)
x, y = np.array(x),np.array(y)
h,k = centerEstimate(x,y)
return np.array([h,k]), radius
def centerEstimate(xList,yList):
x_m = np.mean( np.r_[xList])
y_m = np.mean( np.r_[yList])
return x_m, y_m
Edit: I think it the problem with matND, since i notice the data is already shifted when I try to print out the data. If you need any more information please ask
Thank You for your time

It seems there is no more differences between Mat and MatND. MatND is now obsolete.
By looking at opencv2/core.hpp (version 2.4.8):
typedef Mat MatND;

I learn that the orientation of the data is different when I use findcontours or this matrix.
This matrix use height X width, while the contour put is as width X height. I hate reading apis.

Related

Rotating a reshaped image as a matrix operation

I have a gray scale image that I want to rotate. However, I need to do optimization on it. Therefore, I cannot use pillow or opencv.
I want to reshape this image using python with numpy.reshape into an one dimensional vector (where I use the default settings C-style reshape).
And thereafter, I want to rotate this image around a point using matrix multiplication and addition, i.e. it should be something like
rotated_image_vector = A # vector + b # (or the equivalent in homogenious coordinates).
After this operation I want to reshape the outcome back to two dimensions and have the rotated image.
It would be best if it would as well use linear interpolation between the pixels that do not fit exactly to an other pixel.
The mathematical theory tells it is possible, and I believe there is a very elegant solution to this problem, but I do not see how to create this matrix. Did anyone already have this problem or sees an immediate solution?
Thanks a lot,
Eike
I like your approach but there is a slight misconception in it. What you want to transform are not the pixel values themselves but the coordinates. So you don't reshape your image but rather do a np.indices on it to obtain coordinates to each pixel. For those a rotation around a point looks like
rotation_matrix#(coordinates-fixed_point)+fixed_point
except that I have to transpose a bit to get the dimensions to align. The cove below is a slight adoption of my code in this answer.
As an example I am going to use the Wikipedia-logo-v2 by Nohat. It is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
First I read in the picture, swap x and y axis to not get mad and rotate the coordinates as described above.
import numpy as np
import matplotlib.pyplot as plt
import itertools
image = plt.imread('wikipedia.jpg')
image = np.swapaxes(image,0,1)/255
fixed_point = np.array(image.shape[:2], dtype='float')/2
points = np.moveaxis(np.indices(image.shape[:2]),0,-1).reshape(-1,2)
a = 2*np.pi/8
A = np.array([[np.cos(a),-np.sin(a)],[np.sin(a),np.cos(a)]])
rotated_coordinates = (A#(points-fixed_point.reshape(1,2)).T).T+fixed_point.reshape(1,2)
Now I set up a little class to interpolate between the pixels that do not fit exactly to an other pixel. And finally I swap the axis back and plot it.
class Image_knn():
def fit(self, image):
self.image = image.astype('float')
def predict(self, x, y):
image = self.image
weights_x = [(1-(x % 1)).reshape(*x.shape,1), (x % 1).reshape(*x.shape,1)]
weights_y = [(1-(y % 1)).reshape(*x.shape,1), (y % 1).reshape(*x.shape,1)]
start_x = np.floor(x)
start_y = np.floor(y)
return sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
image_model = Image_knn()
image_model.fit(image)
transformed_image = image_model.predict(*rotated_coordinates.T).reshape(*image.shape)
plt.imshow(np.swapaxes(transformed_image,0,1))
And I get a result like this
Possible Issue
The artifact in the bottom left that looks like one needs to clean the screen comes from the following problem: When we rotate it can happen that we don't have enough pixels to paint the lower left. What we do by default in image_knn is to clip the coordinates to an area where we have information. That means when we ask image knn for pixels coming from outside the image it gives us the pixels at the boundary of the image. This looks good if there is a background but if an object touches the edge of the picture it looks odd like here. Just something to keep in mind when using this.
Thank you for your answer!
But actually it is not a misconception that you could let this roation be represented by a matrix multiplication with the reshaped vector.
I used your code to generate such a matrix (its surely not the most efficient way but it works, most likely you see a more efficient implementation immediately XD. You see I really need it as a matix multiplication :-D).
What I basically did is to generate the representation matrix of the linear transformation, by computing how every of the 100*100 basis images (i.e. the image with zeros everywhere und a one) is mapped by your transformation.
import sys
import numpy as np
import matplotlib.pyplot as plt
import itertools
angle = 2*np.pi/6
image_expl = plt.imread('wikipedia.jpg')
image_expl = image_expl[:,:,0]
plt.imshow(image_expl)
plt.title("Image")
plt.show()
image_shape = image_expl.shape
pixel_number = image_shape[0]*image_shape[1]
rot_mat = np.zeros((pixel_number,pixel_number))
for i in range(pixel_number):
vector = np.zeros(pixel_number)
vector[i] = 1
image = vector.reshape(*image_shape)
fixed_point = np.array(image.shape, dtype='float')/2
points = np.moveaxis(np.indices(image.shape),0,-1).reshape(-1,2)
a = -angle
A = np.array([[np.cos(a),-np.sin(a)],[np.sin(a),np.cos(a)]])
rotated_coordinates = (A#(points-fixed_point.reshape(1,2)).T).T+fixed_point.reshape(1,2)
x,y = rotated_coordinates.T
image = image.astype('float')
weights_x = [(1-(x % 1)).reshape(*x.shape), (x % 1).reshape(*x.shape)]
weights_y = [(1-(y % 1)).reshape(*x.shape), (y % 1).reshape(*x.shape)]
start_x = np.floor(x)
start_y = np.floor(y)
transformed_image_returned = sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
rot_mat[:,i] = transformed_image_returned
if i%100 == 0: print(int(100*i/pixel_number), "% finisched")
plt.imshow((rot_mat # image_expl.reshape(-1)).reshape(image_shape))
Thank you again :-)

Plotting a new image without using the old one in matplotlib?

I'm new to python and matplotlib.
I have implemented the k means algorithm in order to compress and image to
clusters and then plotting the changed image.
my question is: I was not able to plot the new image without using
the old one as a base, I tried a few things but could not quite get the result I want. and it's bad programming if I pass the old image as argument when I can definitely not use it.
Can someone please help?
I tried to create a new ndarray but it did not work.
Here is my function:
def changePic(newPixelList, oldPixel, image_size):
index = 0
new_pixels = []
for pixel in newPixelList:
oldPixel[index] = pixel.classification
index+=1
l = oldPixel.reshape(image_size)
plt.imshow(l)
plt.grid(False)
plt.show()
As you can see I don't really use the oldPixel values, just its structure.
now I'll show you the type of oldPixel:
Here is my loadPic method where X.copy is the argument oldPixel:
def loadPic():
"""
Load pic to array
:return: copy of original X, new lisf of pixels, image size
"""
# data preperation (loading, normalizing, reshaping)
path = 'dog.jpeg'
A = imread(path)
A = A.astype(float) / 255.
img_size = A.shape
X = A.reshape(img_size[0] * img_size[1], img_size[2])
listOfPixel= []
for pixel in X:
listOfPixel.append(Pixel(pixel))
return X.copy(), listOfPixel,img_size
Try this:
def changePic(newPixelList, oldPixel, image_size, picture_num):
index = 0
new_pixels = []
for pixel in newPixelList:
oldPixel[index] = pixel.classification
index+=1
l = oldPixel.reshape(image_size)
plt.figure(picture_num)
plt.imshow(l)
plt.grid(False)
plt.show()
Every plot that you generate should have a different picture_num in order to have separate plots.

Do anyone know how to extract Image coordinate from Marmot dataset?

Marmot is a document image dataset (http://www.icst.pku.edu.cn/cpdp/data/marmot_data.htm) where labelling several things such as document body, image area, table area, table caption and so on. This dataset specially use for document image analysis research purpose. They mentioned all coordinates in 16 digit hexa decimal with little endian format. Is there anyone how worked with this dataset and how to convert that 16 digit XY coordinate to human understandable format?
Finally I got the clue after analysis and posting here if anyone need to investigate this dataset. However, they mentioned the unit value in which way they convert the given coordinate into pixel value but it was difficult to trace out because they did not mentioned it in their manual/guideline. They mentioned another place as an annotation.
First you have to convert their 16 character hexadecimal value using IEEE 754 little endian format. For example, a given coordinates for a label is,
BBox=['4074145c00000005', '4074dd95999999a9', '4080921e74bc6a80', '406fb9999999999a']
Convert using python,
conv_pound = struct.unpack('!d', str(t).decode('hex'))[0]) for t in BBox]
You will get value in "pound" unit which is 1/72 inch. We usually use coordinate in pixel unit and we know 1 inch is 96 pixel. So,
conv_pound = [321.2724609375003, 333.8490234375009, 530.2648710937501, 253.8]
Then, divided each value by 72 and multiply with 96 to finally get corresponding pixel value which is,
in_pixel = [428.36328, 445.13203, 707.01983, 338.40000]
They started to count pixel position from bottom-left corner of the document image. If you consider from top-left corner (usually we consider in this way), you have to subtract 2nd and 4th value from image height. If we consider image [height, width] is [1123, 793] then we can represent the above coordinates in integer value as,
label_boundary = [428, 678, 707, 785]
After staring at the xmls for an hour, I've found the last missing piece in the answer by #MMReza:
You don't need to rely on the units of measure in (step number 3). There is an attribute called "CropBox" of the root element "Page". Use that one to scale the coordinates.
I have something along the following lines (also inverse y axis here):
px0, py1, px1, py0 = list(map(hex_to_double, page.get("CropBox").split()))
pw = abs(px1 - px0)
ph = abs(py1 - py0)
for table in page.findall(".//Composite[#Label='TableBody']"):
x0p, y1m, x1p, y0m = list(map(hex_to_double, table.get("BBox").split()))
x0 = round(imgw*(x0p - px0)/pw)
x1 = round(imgw*(x1p - px0)/pw)
y0 = round(imgh*(py1 - y0m)/ph)
y1 = round(imgh*(py1 - y1m)/ph)
In case anyone is trying to do this in Python 3 like I did, you only have to change step 2 of the other answer like this :
conv_pound = [struct.unpack('!d', bytes.fromhex(t))[0] for t in BBox]
I wanted to convert the coordinates as well as wanted to verify that my conversion actually worked. So, I made this script to read label file and respective image file then extract coordinates of table body(for eg) and visualize them on the images. It can be used to extract other fields in the similar manner. Comments explain it all
import glob
import struct
import cv2
import binascii
import re
xml_files = glob.glob("path_to_labeled_files/*.xml")
for i in xml_files:
# Open the current file and read everything
cur_file = open(i,"r")
content = cur_file.read()
# Find index of all occurrences of only needed portions (eg TableBody this case)
residxs = [l.start() for l in re.finditer('Label="TableBody"', content)]
# Read the image
img = cv2.imread("path_to_images_folder/"+i.split('/')[-1][:-3]+"jpg")
# Traverse over all occurences
for r in residxs[:-1]:
# List to store output points
coords = []
# Start index of an occurence
sidx = r
# Substring from whole file content
substr = content[sidx:sidx+400]
# Now find start index and end index of coordinates in this substring
sidx = substr.find('BBox="')
eidx = substr.find('" CLIDs')
# String containing only points
points = substr[sidx+6:eidx]
# Make the conversion (also take care of little and big endian in unpack)
bins = ''
for j in points.split(' '):
if(j == ''):
continue
coords.append(struct.unpack('>d', binascii.unhexlify(j))[0])
if len(coords) != 4:
continue
# As suggested by MMReza
for k in range(4):
coords[k] = (coords[k]/72)*96
coords[1] = img.shape[0] - coords[1]
coords[3] = img.shape[0] - coords[3]
# Print the extracted coordinates
print(coords)
# Visualize it on the image
cv2.rectangle(img, (int(coords[0]),int(coords[1])) , (int(coords[2]),int(coords[3])), (255, 0, 0), 2)
cv2.imshow("frame",img)
cv2.waitKey(0)

python memory intensive script

After about 4 weeks of learning, experimenting, etc. I finally have a script which does what I want. It changes the perspective of images according to a certain projection matrix I have created. When I run the script for one image it works fine, however I would like to plot six images in one figure. When I try to do this I get a memory error.
All the images are 2448px in width and 2048 px in height each. My script:
files = {'cam1': 'c1.jpg',
'cam2': 'c2.jpg',
'cam3': 'c3.jpg',
'cam4': 'c4.jpg',
'cam5': 'c5.jpg',
'cam6': 'c6.jpg'}
fig, ax = plt.subplots()
for camname in files:
img = Image.open(files[camname])
gray_img = np.asarray(img.convert("L"))
img = np.asarray(img)
height, width, channels = img.shape
usedP = np.array(P[camname][:,[0,1,3]])
usedPinv = np.linalg.inv(usedP)
U, V = np.meshgrid(range(gray_img.shape[1]),
range(gray_img.shape[0]))
UV = np.vstack((U.flatten(),
V.flatten())).T
ones = np.ones((UV.shape[0],1))
UV = np.hstack((UV, ones))
# create UV_warped
UV_warped = usedPinv.dot(UV.T).T
# normalize vector by dividing by the third column (which should be 1)
normalize_vector = UV_warped[:,2].T
UV_warped = UV_warped/normalize_vector[:,None]
# masks
# pixels that are above the horizon and where the V-projection is therefor positive (X in argus): make 0, 0, 1
# pixels that are to far: make 0,0,1
masks = [UV_warped[:,0]<=0, UV_warped[:,0]>2000, UV_warped[:,1]>5000, UV_warped[:,1]<-5000] # above horizon: => [0,0,1]
total_mask = masks[0] | masks[1] | masks[2] | masks[3]
UV_warped[total_mask] = np.array([[0.0, 0.0, 1.0]])
# show plot
X_warped = UV_warped[:,0].reshape((height, width))
Y_warped = UV_warped[:,1].reshape((height, width))
gray_img = gray_img[:-1, :-1]
# add colors
rgb = img[:,:-1,:].reshape((-1,3)) / 255.0 # we have 1 less faces than grid cells
rgba = np.concatenate((rgb, np.ones((rgb.shape[0],1))), axis=1)
plotimg = ax.pcolormesh(X_warped, Y_warped, img.mean(-1)[:,:], cmap='Greys')
plotimg.set_array(None)
plotimg.set_edgecolor('none')
plotimg.set_facecolor(rgba)
ax.set_aspect('equal')
plt.show()
I have the feeling that numpy.meshgrid is quite memory intensive, but I'm not sure. Does anybody see where my memory gets eaten away rapidly? (BTW, I have a laptop with 12Gb of RAM, which is only used by other programs for a very small part)
You might want to profile your code with this library.
It will show you where your script is using memory.
There is a Stackoverflow question here about memory profilers. Also, I've used the trick in this answer in the past as a quick way to get an idea where in the code memory is going out of control. I just print the resource.getrusage() results all over the place. It's not clean, and it doesn't always work, but it's part of the standard library and it's easy to do.
I ordinarily profile with the profile and cProfile modules, as it makes testing individual sections of code fairly easy.
Python Profilers

Numpy Histogram - Python

I have a problem in which a have a bunch of images for which I have to generate histograms. But I have to generate an histogram for each pixel. I.e, for a collection of n images, I have to count the values that the pixel 0,0 assumed and generate an histogram, the same for 0,1, 0,2 and so on. I coded the following method to do this:
class ImageData:
def generate_pixel_histogram(self, images, bins):
"""
Generate a histogram of the image for each pixel, counting
the values assumed for each pixel in a specified bins
"""
max_value = 0.0
min_value = 0.0
for i in range(len(images)):
image = images[i]
max_entry = max(max(p[1:]) for p in image.data)
min_entry = min(min(p[1:]) for p in image.data)
if max_entry > max_value:
max_value = max_entry
if min_entry < min_value:
min_value = min_entry
interval_size = (math.fabs(min_value) + math.fabs(max_value))/bins
for x in range(self.width):
for y in range(self.height):
pixel_histogram = {}
for i in range(bins+1):
key = round(min_value+(i*interval_size), 2)
pixel_histogram[key] = 0.0
for i in range(len(images)):
image = images[i]
value = round(Utils.get_bin(image.data[x][y], interval_size), 2)
pixel_histogram[value] += 1.0/len(images)
self.data[x][y] = pixel_histogram
Where each position of a matrix store a dictionary representing an histogram. But, how I do this for each pixel, and this calculus take a considerable time, this seems to me to be a good problem to be parallelized. But I don't have experience with this and I don't know how to do this.
EDIT:
I tried what #Eelco Hoogendoorn told me and it works perfectly. But applying it to my code, where the data are a large number of images generated with this constructor (after the values are calculated and not just 0 anymore), I just got as h an array of zeros [0 0 0]. What I pass to the histogram method is an array of ImageData.
class ImageData(object):
def __init__(self, width=5, height=5, range_min=-1, range_max=1):
"""
The ImageData constructor
"""
self.width = width
self.height = height
#The values range each pixel can assume
self.range_min = range_min
self.range_max = range_max
self.data = np.arange(width*height).reshape(height, width)
#Another class, just the method here
def generate_pixel_histogram(realizations, bins):
"""
Generate a histogram of the image for each pixel, counting
the values assumed for each pixel in a specified bins
"""
data = np.array([image.data for image in realizations])
min_max_range = data.min(), data.max()+1
bin_boundaries = np.empty(bins+1)
# Function to wrap np.histogram, passing on only the first return value
def hist(pixel):
h, b = np.histogram(pixel, bins=bins, range=min_max_range)
bin_boundaries[:] = b
return h
# Apply this for each pixel
hist_data = np.apply_along_axis(hist, 0, data)
print hist_data
print bin_boundaries
Now I get:
hist_data = np.apply_along_axis(hist, 0, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 104, in apply_along_axis
outshape[axis] = len(res)
TypeError: object of type 'NoneType' has no len()
Any help would be appreciated.
Thanks in advance.
As noted by john, the most obvious solution to this is to look for library functionality that will do this for you. It exists, and it will be orders of magnitude more efficient than what you are doing here.
Standard numpy has a histogram function that can be used for this purpose. If you have only few values per pixel, it will be relatively inefficient; and it creates a dense histogram vector rather than the sparse one you produce here. Still, chances are good the code below solves your problem efficiently.
import numpy as np
#some example data; 128 images of 4x4 pixels
voxeldata = np.random.randint(0,100, (128, 4,4))
#we need to apply the same binning range to each pixel to get sensibble output
globalminmax = voxeldata.min(), voxeldata.max()+1
#number of output bins
bins = 20
bin_boundaries = np.empty(bins+1)
#function to wrap np.histogram, passing on only the first return value
def hist(pixel):
h, b = np.histogram(pixel, bins=bins, range=globalminmax)
bin_boundaries[:] = b #simply overwrite; result should be identical each time
return h
#apply this for each pixel
histdata = np.apply_along_axis(hist, 0, voxeldata)
print bin_boundaries
print histdata[:,0,0] #print the histogram of an arbitrary pixel
But the more general message id like to convey, looking at your code sample and the type of problem you are working on: do yourself a favor, and learn numpy.
Parallelization certainly would not be my first port of call in optimizing this kind of thing. Your main problem is that you're doing lots of looping at the Python level. Python is inherently slow at this kind of thing.
One option would be to learn how to write Cython extensions and write the histogram bit in Cython. This might take you a while.
Actually, taking a histogram of pixel values is a very common task in computer vision and it has already been efficiently implemented in OpenCV (which has python wrappers). There are also several functions for taking histograms in the numpy python package (though they are slower than the OpenCV implementations).

Categories

Resources