OpenCV Python: Reading and setting every pixel too slow

OpenCV Python: Reading and setting every pixel too slow - python

I'm trying to track motion with OpenCV in Python. If a pixel doesn't match the color of the last frame then it gets set to black, otherwise it is set to white if it's static. I got this working pretty good, but if I try to do this with every pixel one by one, performance takes a big hit and it runs too slow.
My code:
import numpy as np
import cv2
def distance(b1, g1, r1, b2, g2, r2):
return abs(b2 - b1) + abs(g2 - g1) + abs(r2 - r1)
pixelStep = 1
lastFrame = None
thresh = 100
cap = cv2.VideoCapture(0)
while(True):
flag, frame = cap.read()
frameInst = frame.copy()
height = np.size(frame, 0)
width = np.size(frame, 1)
if lastFrame != None:
for x in range(0, height, pixelStep):
for y in range(0, width, pixelStep):
b1 = lastFrame.item(x, y, 0)
g1 = lastFrame.item(x, y, 1)
r1 = lastFrame.item(x, y, 2)
b2 = frame.item(x, y, 0)
g2 = frame.item(x, y, 1)
r2 = frame.item(x, y, 2)
dist = distance(b2, g2, r2, b1, g1, r1)
colorValue = 255
if dist > thresh:
colorValue = 0 # Change to black if there's another change from pixel
frame.itemset(x, y, 0, colorValue)
frame.itemset(x, y, 1, colorValue)
frame.itemset(x, y, 2, colorValue)
lastFrame = frameInst
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
If I change pixelStep to 3 it'll run fast and feel right. Am I doing this right or do I need to approach this in a different way?

As a rule of thumb, any python program which works on every pixel will be too slow. Opencv has huge number of functions -- use them! In your case, you can use chain four functions:
'absdiff' (http://docs.opencv.org/modules/core/doc/operations_on_arrays.html#absdiff) to get image with differences only
'split' to get three separate arrays for R, G, B differences
'add' to add them all together
'Threshold' (http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#threshold) to make a single array which shows differences
If you then want to find the areas where differences are, you can use blob detector to give you a list of areas.

Related

How to rotate an image to get not-null pixels?

In the image I linked below, I need to get all the yellow/green pixels in this rotated rectangle and get rid of the blue background, so that the rectangle's axis are aligned with the x and y axis.
I'm using numpy but don't have a clue what I should do.
I uploaded the array in this drive in case anyone would like to work with the actual array
Thanks for the help in advance.

I used the same image as user2640045, but different approach.
import numpy as np
import cv2
# load and convert image to grayscale
img = cv2.imread('image.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# binarize image
threshold, binarized_img = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# find the largest contour
contours, hierarchy = cv2.findContours(binarized_img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
c = max(contours, key = cv2.contourArea)
# get size of the rotated rectangle
center, size, angle = cv2.minAreaRect(c)
# get size of the image
h, w, *_ = img.shape
# create a rotation matrix and rotate the image
M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated_img = cv2.warpAffine(img, M, (w, h))
# crop the image
pad_x = int((w - size[0]) / 2)
pad_y = int((h - size[1]) / 2)
cropped_img = rotated_img[pad_y : pad_y + int(size[1]), pad_x : pad_x + int(size[0]), :]
Result:

I realize there is a allow_pickle=False option in numpys load method but I didn't feel comfortable with unpickling/using data from the internet so I used the small image. After removing the coordinate system and stuff I had
I define two helper methods. One to later rotate the image taken from an other stack overflow thread. See link below. And one to get a mask being one at a specified color and zero otherwise.
import numpy as np
import matplotlib.pyplot as plt
import sympy
import cv2
import functools
color = arr[150,50]
def similar_to_boundary_color(arr, color=tuple(color)):
mask = functools.reduce(np.logical_and, [np.isclose(arr[:,:,i], color[i]) for i in range(4)])
return mask
#https://stackoverflow.com/a/9042907/2640045
def rotate_image(image, angle):
image_center = tuple(np.array(image.shape[1::-1]) / 2)
rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
result = cv2.warpAffine(image, rot_mat, image.shape[1::-1], flags=cv2.INTER_LINEAR)
return result
Next I calculate the angle to rotate about. I do that by finding the lowest pixel at width 50 and 300. I picked those since they are far enough from the boundary to not be effected by missing corners etc..
i,j = np.where(~similar_to_boundary_color(arr))
slope = (max(i[j == 50])-max(i[j == 300]))/(50-300)
angle = np.arctan(slope)
arr = rotate_image(arr, np.rad2deg(angle))
plt.imshow(arr)
.
One way of doing the cropping is the following. You calculate the mid in height and width. Then you take two slices around the mid say 20 pixels in one direction and to until the mid in the other one. The biggest/smallest index where the pixel is white/background colored is a reasonable point to cut.
i,j = np.where(~(~similar_to_boundary_color(arr) & ~similar_to_boundary_color(arr, (0,0,0,0))))
imid, jmid = np.array(arr.shape)[:2]/2
imin = max(i[(i < imid) & (jmid - 10 < j) & (j < jmid + 10)])
imax = min(i[(i > imid) & (jmid - 10 < j) & (j < jmid + 10)])
jmax = min(j[(j > jmid) & (imid - 10 < i) & (i < imid + 10)])
jmin = max(j[(j < jmid) & (imid - 10 < i) & (i < imid + 10)])
arr = arr[imin:imax,jmin:jmax]
plt.imshow(arr)
and the result is:

Object detection model for detecting rectangular shape (text cursor) in a video?

I'm currently doing some research to detect and locate a text-cursor (you know, the blinking rectangle shape that indicates the character position when you type on your computer) from a screen-record video. To do that, I've trained YOLOv4 model with custom object dataset (I took a reference from here) and planning to also implement DeepSORT to track the moving cursor.
Here's the example of training data I used to train YOLOv4:
Here's what I want to achieve:
Do you think using YOLOv4 + DeepSORT is considered overkill for this task? I'm asking because as of now, only 70%-80% of the video frame that contains the text-cursor can be successfully detected by the model. If it is overkill after all, do you know any other method that can be implemented for this task?
Anyway, I'm planning to detect the text-cursor not only from Visual Studio Code window, but also from Browser (e.g., Google Chrome) and Text Processor (e.g., Microsoft Word) as well. Something like this:
I'm considering the Sliding Window method as an alternative, but from what I've read, the method might consume much resources and perform slower. I'm also considering Template Matching from OpenCV (like this), but I don't think it will perform better and faster than the YOLOv4.
The constraint is about the performance speed (i.e, how many frames can be processed given amount of time) and the detection accuracy (i.e, I want to avoid letter 'l' or '1' detected as the text-cursor, since those characters are similar in some font). But higher accuracy with slower FPS is acceptable I think.
I'm currently using Python, Tensorflow, and OpenCV for this.
Thank you very much!

This would work if the cursor is the only moving object on the screen. Here is the before and after:
Before:
After:
The code:
import cv2
import numpy as np
BOX_WIDTH = 10
BOX_HEIGHT = 20
def process_img(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5))
img_canny = cv2.Canny(img_gray, 50, 50)
return img_canny
def get_contour(img):
contours, hierarchies = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
if contours:
return max(contours, key=cv2.contourArea)
def get_line_tip(cnt1, cnt2):
x1, y1, w1, h1 = cv2.boundingRect(cnt1)
if h1 > BOX_HEIGHT / 2:
if np.any(cnt2):
x2, y2, w2, h2 = cv2.boundingRect(cnt2)
if x1 < x2:
return x1, y1
return x1 + w1, y1
def get_rect(x, y):
half_width = BOX_WIDTH // 2
lift_height = BOX_HEIGHT // 6
return (x - half_width, y - lift_height), (x + half_width, y + BOX_HEIGHT - lift_height)
cap = cv2.VideoCapture("screen_record.mkv")
success, img_past = cap.read()
cnt_past = np.array([])
line_tip_past = 0, 0
while True:
success, img_live = cap.read()
if not success:
break
img_live_processed = process_img(img_live)
img_past_processed = process_img(img_past)
img_diff = cv2.bitwise_xor(img_live_processed, img_past_processed)
cnt = get_contour(img_diff)
line_tip = get_line_tip(cnt, cnt_past)
if line_tip:
cnt_past = cnt
line_tip_past = line_tip
else:
line_tip = line_tip_past
rect = get_rect(*line_tip)
img_past = img_live.copy()
cv2.rectangle(img_live, *rect, (0, 0, 255), 2)
cv2.imshow("Cursor", img_live)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cv2.destroyAllWindows()
Breaking it down:
Import the necessary libraries:
import cv2
import numpy as np
Define the size of the tracking box depending on the size of the cursor:
BOX_WIDTH = 10
BOX_HEIGHT = 20
Define a function to process the frames into edges:
def process_img(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5))
img_canny = cv2.Canny(img_gray, 50, 50)
return img_canny
Define a function that would retrieve the contour with the greatest area in an image (the cursor doesn't need to be large for this to work, it can be tiny if needed):
def get_contour(img):
contours, hierarchies = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
if contours:
return max(contours, key=cv2.contourArea)
Define a function that will take in 2 contours, one being the contour of the cursor + some text for the current frame, the other being the contour + some stray text for the contour of the cursor + some text from the frame before. With the two contours, we can identify if the cursor is moving left or right:
def get_line_tip(cnt1, cnt2):
x1, y1, w1, h1 = cv2.boundingRect(cnt1)
if h1 > BOX_HEIGHT / 2:
if np.any(cnt2):
x2, y2, w2, h2 = cv2.boundingRect(cnt2)
if x1 < x2:
return x1, y1
return x1 + w1, y1
Define a function that will take in the tip points of the cursor, and return a box based on the BOX_WIDTH and BOX_HEIGHT constants defined earlier:
def get_rect(x, y):
half_width = BOX_WIDTH // 2
lift_height = BOX_HEIGHT // 6
return (x - half_width, y - lift_height), (x + half_width, y + BOX_HEIGHT - lift_height)
Define a capture devices for the video, and remove one frame from the start of the video and store it in a variable that will be used as the frame before every frame. Also define temporary values for the past contour and past line tip:
cap = cv2.VideoCapture("screen_record.mkv")
success, img_past = cap.read()
cnt_past = np.array([])
line_tip_past = 0, 0
Use a while loop, and read from the video. Process the frame and the frame before that frame in the video:
while True:
success, img_live = cap.read()
if not success:
break
img_live_processed = process_img(img_live)
img_past_processed = process_img(img_past)
With the processed frames, we can find the difference between the frame using the cv2.bitwise_xor method to get where the movement is on the screen. Then, we can find the contour of the movement between the 2 frames using the get_contour function defined:
img_diff = cv2.bitwise_xor(img_live_processed, img_past_processed)
cnt = get_contour(img_diff)
With the contour, we can utilize the get_line_tip function defined to find the tip of the cursor. If a tip was found, save it into the line_tip_past variable to use for the next iteration, and if a tip was not found, we can us the past tip we saved as the current tip:
line_tip = get_line_tip(cnt, cnt_past)
if line_tip:
cnt_past = cnt
line_tip_past = line_tip
else:
line_tip = line_tip_past
Now we define a rectangle using the cursor tip and the get_rect function we defined earlier, and draw it onto the current frame. But before drawing it on, we save the frame to be the frame before the current frame of the next iteration:
rect = get_rect(*line_tip)
img_past = img_live.copy()
cv2.rectangle(img_live, *rect, (0, 0, 255), 2)
Finally, we display the frame:
cv2.imshow("Cursor", img_live)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cv2.destroyAllWindows()

Efficiently transform and blend images in OpenCV

I'm making a program to simulate brush strokes with a given image, and I'm held up the actual putting the brush strokes down part. Right now I'm using OpenCV and Numpy to rotate and mix images together by setting a section of the original image to be the new one. Here's the current code.
def rotate(img, deg):
"""Rotates an image a certain number of degrees"""
h, w, c = img.shape
matrix = cv2.getRotationMatrix2D((w / 2, h / 2), deg, 1)
out = cv2.warpAffine(img, matrix, (w, h), flags=cv2.INTER_LINEAR)
return out
def paste(img1, img2, mask, x, y):
"""Pastes one image on top of another at a given location"""
h1, w1, _ = img1.shape # dimensions of original image
h2, w2, _ = img2.shape # dimensions of pasted image
if y > h1 or x > w1:
return img1
stamp = img2
factor = mask
subx = ((w2 + x) - w1)
suby = ((h2 + y) - h1)
if suby <= 0:
suby = 0
if subx <= 0:
subx = 0
stamp = img2[0:h2 - suby, 0:w2 - subx]
factor = mask[0:h2 - suby, 0:w2 - subx]
snippet = img1[y:y + (h2 - suby), x:x + (w2 - subx)]
stamp = cv2.add(cv2.multiply(snippet, (1-factor)), cv2.multiply(stamp, factor))
out = img1
out[y:y + h2, x:x + w2] = stamp
return out
It works, but only barely. There are issues that come up with certain scenarios, it's very limited as to what you're allowed to do with it, and is extremely slow. Is there an easier/better way to transform and blend images with OpenCV, or would a different framework (sprite/object based, like pygame) be better for this? Thanks!

I suggest you try seamless cloning. Check this excellent tutorial from Satya Mallick: https://www.learnopencv.com/seamless-cloning-using-opencv-python-cpp/
In this case you can substitute the plane in the tutorial for your brush.

overlap misplaced images for the maximum similarity and find difference using clustering techniques rather then pixel to pixel comparison

I am working for image comparison which are 99% same amd 1% difference.
I am capturing Image of the print using Vision camera (mounted over a fix stand).
I tried all Image comparison algorithm : opencv, ImageMagic, skimage. (result was 80 to 90 percent accuracy)
link : “Diff” an image using ImageMagick
link : How can I quantify difference between two images?
I implemented all the above solution from the above question to find the difference, but the problem with above algorithm is, they work pixel to pixel. non of this algorithm provided a smarter approach for image comparison.
After capturing image of two different prints of same type I do the following steps for image comparison:
my code for overlapping misplaced images for the maximum similarity is :
code for image aliment :
import cv2
import numpy as np
# load image
img = cv2.imread('./photo/image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # convert to grayscale
retval, thresh_gray = cv2.threshold(gray, 100, maxval=255, type=cv2.THRESH_BINARY_INV)
contours, hierarchy = cv2.findContours(thresh_gray,cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
mx_rect = (0,0,0,0)
mx_area = 0
for cnt in contours:
arect = cv2.minAreaRect(cnt)
area = arect[1][0]*arect[1][1]
if area > mx_area:
mx_rect, mx_area = arect, area
x,y,w,h = cv2.boundingRect(cnt)
# cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),8)
roi_1 = img[y:y+h, x:x+w]
cv2.imwrite('./test/Image_rec.jpg', roi_1)
print("shape of cnt: {}".format(cnt.shape))
rect = cv2.minAreaRect(cnt)
print("rect: {}".format(rect))
box = cv2.boxPoints(rect)
box = np.int0(box)
width = int(rect[1][0])
height = int(rect[1][1])
src_pts = box.astype("float32")
dst_pts = np.array([[0, height-1],
[0, 0],
[width-1, 0],
[width-1, height-1]], dtype="float32")
M = cv2.getPerspectiveTransform(src_pts, dst_pts)
warped = cv2.warpPerspective(img, M, (width, height))
cv2.imwrite('./crop_Image_rortate.jpg', warped)
Above code gives the required image i.e. it tries to alien image, it crop the required image, but some time it fails as well (2/10, 2 out of 10 fails).
Once the image is crop, I compare it to find the difference using clustering techniques. my code for comparison is as follow :
from PIL import Image
import numpy as np
import cv2
import scipy.misc as smp
f1= './Image_1.png'
f2= './Image_2.png'
im1 = Image.open(f1)
im2 = Image.open(f2)
img1= cv2.imread(f1)
img2= cv2.imread(f2)
# print (img1.shape)
# print (img2.shape)
w_1= img1.shape[0]
h_1= img1.shape[1]
W_1 = w_1-1
H_1 = h_1-1
c = 0
X=[]
Y=[]
R=[]
G=[]
B=[]
rgb = im1.convert('RGB')
rgb2 = im2.convert('RGB')
for x in range(H_1):
for y in range(W_1):
r1, g1, b1, = rgb.getpixel((x,y))
t1= r1+g1+b1
i = x
j = y
r2, g2, b2, = rgb2.getpixel((i,j))
t2=r2+g2+b2
d= t1-t2
if d in range (-150,150):
# print (d)
pass
else:
c = c + 1
if (c == 1):
z=y
elif (y == z+1 ):
# print (x,y)
i = x+1
j = y+1
r2, g2, b2, = rgb2.getpixel((i,j))
t2=r2+g2+b2
d= t1-t2
if d in range (-150,150):
# print (d)
pass
else:
X.append(x)
Y.append(y)
R.append(r1)
G.append(g1)
B.append(b1)
z=y
z1=y # to make group of 2.
try:
data = np.zeros( (h_1,w_1,3), dtype=np.uint8 )
length = len(X)
print ("total pixel difference : ",length)
for i in range(length):
data[X[i],Y[i]] = [R[i],G[i],B[i]]
img = Image.fromarray( data, 'RGB' )
img.save('./test/new.png')
img.show()
except:
print ("Error during image creation. ")
above code tries to implement clustering base image comparison and it is also slow.
comparison skips the first pixel as difference even it is a difference for every row. it will look for major difference only.
But still the problem remains same, pixel to pixel comparison.
Is there a proper clustering technique which will target the proper difference.
I don't want to do pixel to pixel image comparison, as it provide incorrect result for me.
I am also open to other techniques for image comparison, If available and not listed.
Image Sample :
Image 1:
Image 1
Image 2:
Image 2
Accepted Output:
Accepted output
output after difference :
output
Thanks.

Changing pixel colors in Python: How to do it faster?

I have an RGB image and am trying to set every pixel on my RGB to black where the corresponding alpha pixel is black as well. So basically I am trying to "bake" the alpha into my RGB.
I have tried this using PIL pixel access objects, PIL ImageMath.eval and numpy arrays:
PIL pixel access objects:
def alphaCutoutPerPixel(im):
pixels = im.load()
for x in range(im.size[0]):
for y in range(im.size[1]):
px = pixels[x, y]
r,g,b,a = px
if px[3] == 0: # If alpha is black...
pixels[x,y] = (0,0,0,0)
return im
PIL ImageMath.eval:
def alphaCutoutPerBand(im):
newBands = []
r, g, b, a = im.split()
for band in (r, g, b):
out = ImageMath.eval("convert(min(band, alpha), 'L')", band=band, alpha=a)
newBands.append(out)
newImg = Image.merge("RGB", newBands)
return newImg
Numpy array:
def alphaCutoutNumpy(im):
data = numpy.array(im)
r, g, b, a = data.T
blackAlphaAreas = (a == 0)
# This fails; why?
data[..., :-1][blackAlphaAreas] = (0, 255, 0)
return Image.fromarray(data)
The first method works fine, but is really slow. The second method works fine for a single image, but will stop after the first when asked to convert multiple. The third method I created based on this example (first answer): Python: PIL replace a single RGBA color
But it fails at the marked command:
data[..., :-1][blackAlphaAreas] = (0, 255, 0, 0)
IndexError: index (295) out of range (0<=index<294) in dimension 0
Numpy seems promising for this kind of stuff, but I dont really get the syntax on how to set parts of the array in one step. Any help? Maybe other ideas to achieve what I describe above quickly?
Cheers

This doesn't use advanced indexing but is easier to read, imho:
def alphaCutoutNumpy(im):
data = numpy.array(im)
data_T = data.T
r, g, b, a = data_T
blackAlphaAreas = (a == 0)
data_T[0][blackAlphaAreas] = 0
data_T[1][blackAlphaAreas] = 0
data_T[2][blackAlphaAreas] = 0
#data_T[3][blackAlphaAreas] = 255
return Image.fromarray(data[:,:,:3])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.