I am trying to find a way to show an image without the dependency of waitKey(). I want the image to be shown and continue on with next operations (like a plot using matplotlib). How can this be achieved?
If you're wanting to show the window and have the program continue execution without relying on cv2.waitKey(), then cv2.startWindowThread() is what you're looking for.
Example:
import cv2
img = cv2.imread("C:\\Test\\so1.png")
cv2.imshow("Test", img)
cv2.startWindowThread()
for x in range(0, 10000000):
print(x)
This will display the image and continue execution without using waitKey
I tried multiple methods, but what worked was using matplotlib
import matplotlib.pyplot as plt
#obtain I as a numpy array
plt.imshow(cv2.cvtColor(I, cv2.COLOR_BGR2RGB))
The problem
Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it. A perfect example of this is a video by Micheal Reeves
where whenever he loses health in a game it shows it and sends it to his Bluetooth enabled airsoft gun to shoot him. So far I have this:
# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))
while(True):
x = 760
y = 968
ox = 50
oy = 22
# screen capture
img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
img_np = np.array(img)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
cv2.imshow("Screen", frame)
out.write(frame)
if cv2.waitKey(1) == 0:
break
out.release()
cv2.destroyAllWindows()
it captures real-time and displays it in a window but I have no clue how to make it recognise the text every frame and output it.
any help?
It's fairly simple to grab the screen and pass it to tesseract for OCRing.
The PIL (pillow) library can grab the frames easily on MacOS and Windows. However, this feature has only recently been added for Linux, so the code below works around it not existing. (I'm on Ubuntu 19.10 and my Pillow does not support it).
Essentially the user starts the program with screen-region rectangle co-ordinates. The main loop continually grabs this area of the screen, feeding it to Tesseract. If Tesseract finds any non-whitespace text in that image, it is written to stdout.
Note that this is not a proper Real Time system. There is no guarantee of timeliness, each frame takes as long as it takes. Your machine might get 60 FPS or it might get 6. This will also be greatly influenced by the size of the rectangle your ask it to monitor.
#! /usr/bin/env python3
import sys
import pytesseract
from PIL import Image
# Import ImageGrab if possible, might fail on Linux
try:
from PIL import ImageGrab
use_grab = True
except Exception as ex:
# Some older versions of pillow don't support ImageGrab on Linux
# In which case we will use XLib
if ( sys.platform == 'linux' ):
from Xlib import display, X
use_grab = False
else:
raise ex
def screenGrab( rect ):
""" Given a rectangle, return a PIL Image of that part of the screen.
Handles a Linux installation with and older Pillow by falling-back
to using XLib """
global use_grab
x, y, width, height = rect
if ( use_grab ):
image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
else:
# ImageGrab can be missing under Linux
dsp = display.Display()
root = dsp.screen().root
raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
# DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
return image
### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
EXE = sys.argv[0]
del( sys.argv[0] )
# EDIT: catch zero-args
if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ): # some minor help
sys.stderr.write( EXE + ": monitors section of screen for text\n" )
sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
sys.exit( 1 )
# TODO - add error checking
x = int( sys.argv[0] )
y = int( sys.argv[1] )
width = int( sys.argv[2] )
height = int( sys.argv[3] )
# Area of screen to monitor
screen_rect = [ x, y, width, height ]
print( EXE + ": watching " + str( screen_rect ) )
### Loop forever, monitoring the user-specified rectangle of the screen
while ( True ):
image = screenGrab( screen_rect ) # Grab the area of the screen
text = pytesseract.image_to_string( image ) # OCR the image
# IF the OCR found anything, write it to stdout.
text = text.strip()
if ( len( text ) > 0 ):
print( text )
This answer was cobbled together from various other answers on SO.
If you use this answer for anything regularly, it would be worth adding a rate-limiter to save some CPU. It could probably sleep for half a second every loop.
Tesseract is a single-use command-line application using files for input and output, meaning every OCR call creates a new process and initializes a new Tesseract engine, which includes reading multi-megabyte data files from disk. Its suitability as a real-time OCR engine will depend on the exact use case—more pixels requires more time—and which parameters are provided to tune the OCR engine. Some experimentation may ultimately be required to tune the engine to the exact scenario, but also expect the time required to OCR for a frame may exceed the frame time and a reduction in the frequency of OCR execution may be required, i.e. performing OCR at 10-20 FPS rather than 60+ FPS the game may be running at.
In my experience, a reasonably complex document in a 2200x1700px image can take anywhere from 0.5s to 2s using the english fast model with 4 cores (the default) on an aging CPU, however this "complex document" represents the worst-case scenario and makes no assumptions on the structure of the text being recognized. For many scenarios, such as extracting data from a game screen, assumptions can be made to implement a few optimizations and speed up OCR:
Reduce the size of the input image. When extracting specific information from the screen, crop the grabbed screen image as much as possible to only that information. If you're trying to extract a value like health, crop the image around just the health value.
Use the "fast" trained models to improve speed at the cost of accuracy. You can use the -l option to specify different models and the --testdata-dir option to specify the directory containing your model files. You can download multiple models and rename the files to "eng_fast.traineddata", "eng_best.traineddata", etc.
Use the --psm parameter to prevent page segmentation not required for your scenario. --psm 7 may be the best option for singular pieces of information, but play around with different values and find which works best.
Restrict the allowed character set if you know which characters will be used, such as if you're only looking for numerics, by changing the whitelist configuration value: -c tessedit_char_whitelist='1234567890'.
pytesseract is the best way to get started with implementing Tesseract, and the library can handle image input directly (although it saves the image to a file before passing to Tesseract) and pass the resulting text back using image_to_string(...).
import pytesseract
# Capture frame...
# If the frame requires cropping:
frame = frame[y:y + h, x:x + w]
# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")
# Process the result
health = int(text)
Alright, I was having the same issue as you so I did some research into it and I'm sure that I found the solution! First, you will need these libraries:
cv2
pytesseract
Pillow(PIL)
numpy
Installation:
To install cv2, simply use this in a command line/command prompt: pip install opencv-python
Installing pytesseract is a little bit harder as you also need to pre-install Tesseract which is the program that actually does the ocr reading. First, follow this tutorial on how to install Tesseract. After that, in a command line/command prompt just use the command: pip install pytesseract
If you don't install this right you will get an error using the ocr
To install Pillow use the following command in a command-line/command prompt: python -m pip install --upgrade Pillow or python3 -m pip install --upgrade Pillow. The one that uses python works for me
To install NumPy, use the following command in a command-line/command prompt: pip install numpy. Thought it's usually already installed in most python libraries.
Code:
This code was made by me and as of right now it works how I want it to and similar to the effect that Michal had. It will take the top left of your screen, take a recorded image of it and show a window display of the image it's currently using OCR to read. Then in the console, it is printing out the text that it read on the screen.
# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract
# We only need the ImageGrab class from PIL
from PIL import ImageGrab
# Run forever unless you press Esc
while True:
# This instance will generate an image from
# the point of (115, 143) and (569, 283) in format of (x, y)
cap = ImageGrab.grab(bbox=(115, 143, 569, 283))
# For us to use cv2.imshow we need to convert the image into a numpy array
cap_arr = np.array(cap)
# This isn't really needed for getting the text from a window but
# It will show the image that it is reading it from
# cv2.imshow() shows a window display and it is using the image that we got
# use array as input to image
cv2.imshow("", cap_arr)
# Read the image that was grabbed from ImageGrab.grab using pytesseract.image_to_string
# This is the main thing that will collect the text information from that specific area of the window
text = pytesseract.image_to_string(cap)
# This just removes spaces from the beginning and ends of text
# and makes the the it reads more clean
text = text.strip()
# If any text was translated from the image, print it
if len(text) > 0:
print(text)
# This line will break the while loop when you press Esc
if cv2.waitKey(1) == 27:
break
# This will make sure all windows created from cv2 is destroyed
cv2.destroyAllWindows()
I hope this helped you with what you were looking for, it sure did help me!
I've looked all over and can't find an answer to this question, so I am going to go ahead and ask. I am trying to figure out opencv for python and I am getting stuck. Everytime I run this code it tells me there is no attribute named 'CV_CHAIN_APPROX_SIMPLE'. Then I tried using every CV_CHAIN_APPROX and nothing worked. I even went to my IDLE and it didn't know what it was either, so I'm stuck.
I am trying to make a program that will find the center cordinates of a moving target. Would using findContours be the right direction?
Sorry! I'm new to stackoverflow, so I'm not really sure what I can ask in these questions!
Thanks so much!!
import cv2
import numpy as np
cap = cv2.VideoCapture(0)
fgbg = cv2.createBackgroundSubtractorMOG2()
while True:
_, frame = cap.read()
fgmask = fgbg.apply(frame)
filtered = cv2.medianBlur(fgmask, 15)
(a, b, c) = cv2.findContours(filtered, cv2.RETR_EXTERNAL, cv2.CV_CHAIN_APPROX_SIMPLE)
cv2.imshow('cont', b)
cv2.imshow('fg', fgmask)
cv2.waitKey(1)
You are using Opencv 3.0> based on createBackgroundSubtractorMOG2, so your line need to be like:
(a, b, c) = cv2.findContours(filtered, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Don't forget cv2. in front of every OpenCV function.
In version '2.4.9' I can clearly see cv2.CHAIN_APPROX_NONE (and others too) are you sure you haven't forgotten to add the cv2. in front of it like you did in your example. If not please add the version by doing cv2.__version__.
I've showed in various ways how to take images with a webcam in Python (see How can I take camera images with Python?). You can see that the images taken with Python are considerably darker than images taken with JavaScript. What is wrong?
Image example
The image on the left was taken with http://martin-thoma.com/html5/webcam/, the one on the right with the following Python code. Both were taken with the same (controlled) lightning situation (it was dark outside and I only had some electrical lights on) and the same webcam.
Code example
import cv2
camera_port = 0
camera = cv2.VideoCapture(camera_port)
return_value, image = camera.read()
cv2.imwrite("opencv.png", image)
del(camera) # so that others can use the camera as soon as possible
Question
Why is the image taken with Python image considerably darker than the one taken with JavaScript and how do I fix it?
(Getting a similar image quality; simply making it brighter will probably not fix it.)
Note to the "how do I fix it": It does not need to be opencv. If you know a possibility to take webcam images with Python with another package (or without a package) that is also ok.
Faced the same problem. I tried this and it works.
import cv2
camera_port = 0
ramp_frames = 30
camera = cv2.VideoCapture(camera_port)
def get_image():
retval, im = camera.read()
return im
for i in xrange(ramp_frames):
temp = camera.read()
camera_capture = get_image()
filename = "image.jpg"
cv2.imwrite(filename,camera_capture)
del(camera)
I think it's about adjusting the camera to light. The former
former and later images
I think that you have to wait for the camera to be ready.
This code works for me:
from SimpleCV import Camera
import time
cam = Camera()
time.sleep(3)
img = cam.getImage()
img.save("simplecv.png")
I took the idea from this answer and this is the most convincing explanation I found:
The first few frames are dark on some devices because it's the first
frame after initializing the camera and it may be required to pull a
few frames so that the camera has time to adjust brightness
automatically.
reference
So IMHO in order to be sure about the quality of the image, regardless of the programming language, at the startup of a camera device is necessary to wait a few seconds and/or discard a few frames before taking an image.
Tidying up Keerthana's answer results in my code looking like this
import cv2
import time
def main():
capture = capture_write()
def capture_write(filename="image.jpeg", port=0, ramp_frames=30, x=1280, y=720):
camera = cv2.VideoCapture(port)
# Set Resolution
camera.set(3, x)
camera.set(4, y)
# Adjust camera lighting
for i in range(ramp_frames):
temp = camera.read()
retval, im = camera.read()
cv2.imwrite(filename,im)
del(camera)
return True
if __name__ == '__main__':
main()
So, this is what I am trying:
import cv2
import cv2.cv as cv
cv2.namedWindow(threeDWinName, cv2.CV_WINDOW_AUTOSIZE)
img2 = cv.CreateImage((320, 240), 32, 1)
cv2.imshow(threeDWinName,img2)
Does anybody know what is going wrong with this? I get TypeError: <unknown> is not a numpy array
Thanks
The more recent version of OpenCV, cv2 uses numpy arrays for images, the preceding version cv used opencv's special Mat's. In your code you've created an image as a Mat using the old cv function CreateImage, and then tried to view it using the newer cv2.imshow function, but cv2.imshow expects a numpy array...
...so all you need to do is import numpy, and then change you CreateImage line to:
img2 = np.zeros((320,240),np.float32)
And then it should be fine :)