I'm currently trying to write a script to detect text in an OBS video stream using Python/OpenCV.
From every n-th frame, I need to detect text in several specific boundaries (Example can be found in the attachment). The coordinates of these boundaries are constant for all video frames.
My questions:
is OpenCV the best approach to solve my task?
what OpenCV function should I use to specify multiple boundaries for text detection?
is there a way to use a video stream from OBS as an input to my script?
Thank you for your help!
I can't say anything about OBS but openCV + Tessaract should be all you need. Since you know the location of the text very precisely it will be very easy to use. here is a quite comprehensive tutorial on using both, which includes bits on finding where the text is in the image.
The code could look like this:
img = cv2.imread("...") # or wherever you get your image from
region = [100, 200, 200, 400] # regions where text is
# Tessaract expects rgb open cv uses bgr
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
output = pytesseract.image_to_string(img_rgb[region[0]:region[2], region[1]: region[3]])
The only other steps that might be required are to invert the image in order to make it dark text on a light background. Those tips can be found here. For example removing the red background that is in one of the boxes you highlighted might help with accuracy, which can be achieved by thresholding on red values img_rgb[img_rgb[...,0] > 250] = [255, 255,255].
As for reading your images in, this other question might help.
Related
I'm not very good with programming. Is there any way to be able to select and mask lines of a certain length and basic linear shapes like circles, parabola, squares, etc? I want to then use a selected - mouse clicked - line/shape in another pipeline, so preferably having it masked and stored in another image
I currently have this basic open cv python code which is able to give me a traced out image:
# Reading the required image in
# which operations are to be done.
# Make sure that the image is in the same
# directory in which this python program is
img = cv2.imread('/workspaces/85332242/Personal/Unknown.jpeg')
# Convert the img to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply edge detection method on the image
edges = cv2.Canny(gray, 50, 200,apertureSize = 3)
cv2.imwrite('linesDetected.jpg', edges)
This produces:
I'd like to be able to click and select separate lines in this image.
Any help would be much appreciated!
I want to create a basic video editing application where the user can import video clips and then use symmetry (vertical or horizontal) and offsets on their videos. How feasible is this?
For instance, consider the following image:
Right-symmetry:
Image offset to the top-left:
If that last image is confusing, basically you can think of it as the images repeating one next to the the other in a grid, infinitely, such that they're symmetric. Then, you can select a window of this grid equal to the size of the original image. Eg. the red square represents the window:
This is very feasible. Opencv can do all of this frame by frame. Although it would probably take sometime for high quality/long videos. If you want to know how to do these operations, I would open seperate questions. mirroring can for example be done by cv2.flip().
You can use the .flip () method present in the cv2 library. First enter the image with cv2.imread (path). Then to make the mirror effect you have to create a insert cv2.flip (image, 0).
Just as reported below:
image = cv2.imread(path)
mirrow = cv2.flip(image, 0)
I'm currently learning about computer vision OCR. I have an image that needs to be scan. I face a problem during the image cleansing.
I use opencv2 in python to do the things. This is the original image:
image = cv2.imread(image_path)
cv2.imshow("imageWindow", image)
I want to cleans the above image, the number at the middle (64) is the area I wanted to scan. However, the number got cleaned as well.
image[np.where((image > [0,0,105]).all(axis=2))] = [255,255,255]
cv2.imshow("imageWindow", image)
What should I do to correct the cleansing here? I wanted to make the screen where the number 64 located is cleansed coz I will perform OCR scan afterwards.
Please help, thank you in advance.
What you're trying to do is called "thresholding". Looks like your technique is recoloring pixels that fall below a certain threshold, but the LCD digit darkness varies enough in that image to throw it off.
I'd spend some time reading about thresholding, here's a good starting place:
Thresholding in OpenCV with Python. You're probably going to need an adaptive technique (like Adaptive Gaussian Thresholding), but you may find other ways that work for your images.
Considering I have the coordinates already of the area of the image I want to do image processing on. It was already explained here using Rect but how do you do this on python OpenCV 3?
From the link you gave, it seems you don't want the output in a different image variable, given that you know the coordinates of the region you want to process. I'll assume your image processing function to be cv2.blur() so this is how it'll be:
image[y:y+height, w:w+width] = cv2.blur(image[y:y+height, w:w+width], (11,11))
Here, x & y are your ROI starting co-ordinates, and height & width are the height, width of the ROI
Hope this is what you wanted, or if it's anything different, provide more details in your question.
It would be very useful if you would provide more details and maybe some code you've tried.
From my understanding, you want to do image processing on a region of an image array only. You can do something like
foo(im[i1:i2, j1:j2, :])
Where foo is your image processing function.
I want to convert the picture into black and white image accurately where the seeds will be represented by white color and the background as black color. I would like to have it in python opencv code. Please help me out
I got good result for the above picture using the given code below. Now I have another picture for which thresholding doesn't seem to work. How can I tackle this problem. The output i got is in the following picture
also, there are some dents in the seeds, which the program takes it as the boundary of the seed which is not a good results like in the picture below. How can i make the program ignore dents. Is masking the seeds a good option in this case.
I converted the image from BGR color space to HSV color space.
Then I extracted the hue channel:
Then I performed threshold on it:
Note:
Whenever you face difficulty in certain areas try working in a different color space, the HSV color space being most prominent.
UPDATE:
Here is the code:
import cv2
import numpy as np
filename = 'seed.jpg'
img = cv2.imread(filename) #---Reading image file---
hsv_img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV) #---Converting RGB image to HSV
hue, saturation, value, = cv2.split(hsv_img) #---Splitting HSV image to 3 channels---
blur = cv2.GaussianBlur(hue,(3,3),0) #---Blur to smooth the edges---
ret,th = cv2.threshold(blur, 38, 255, 0) #---Binary threshold---
cv2.imshow('th.jpg',th)
Now you can perform contour operations to highlight your regions of interest also. Try it out!! :)
ANOTHER UPDATE:
I found the contours higher than a certain constraint to get this:
There are countless ways for image segmentation.
The simplest one is a global threshold operation. If you want to know more on other methods you should read some books. Which I recommend anyway befor you do any further image processing. It doesn't make much sense to start image processing if you don't know the most basic tools.
Just to show you how this could be achieved:
I converted the image from RGB to HSB. I then applied separate global thresholds to the hue and brightness channels to get the best segmentation result for both images.
Both binary images were then combined using a pixelwise AND operation. I did this because both channels gave sub-optimal results, but their overlap was pretty good.
I also applied some morphological operators to clean up the results.
Of course you can just invert the image to get the desired black background...
Thresholds and the used channels of course depend on the image you have and what you want to achieve. This is a very case-specific process that can be dynamically adapted to a limited extend.
This could be followed by labling or whatever you need: