Well, let's separate this doubt into parts.
First question, how can I make pyautogui.locateOnScreen() in a specific app window on windows? Example, search for an image only in the windows calculator?
The second question is I have 2 monitors, how do I search for an image on a specific monitor?
I make a simple code, but not working because my calculator is opened on my second monitor.
def main():
while True:
button7location = pyautogui.locateOnScreen('images/calc7Key.png', region=(0,0,1920, 1080), confidence=.5)
except KeyboardInterrupt:
Unfortunately pyautogui currently doesn't work with multiple monitors, you can find it in their FAQ
Q: Does PyAutoGUI work on multi-monitor setups.
A: No, right now PyAutoGUI only handles the primary monitor.
As of searching specific area you can use optional region=(startXValue,startYValue,width,height) parameter as shown here.
this answer may be late but for those looking, the answer is here :
quote - Resolve for me
If still relevant for someone on windows:
In my opinion the issue is, that the current version of pyscreeze
utilizing >ImageGrab (Pillow) on windows only uses single-screen grab.
A dirty quick fix in pyscreeze could be:
enable all_screen grabbing:
In file: pyscreeze/__init__.py, function: def _screenshot_win32(imageFilename=None, region=None):
change im = ImageGrab.grab() to im = ImageGrab.grab(all_screens= True)
handle new introduced negative coordinates due to multiple monitor:
In file: pyscreeze/__init__.py, function: def locateOnScreen(image, minSearchTime=0, **kwargs): behind retVal = locate(image, screenshotIm, **kwargs) >add
if retVal and sys.platform == 'win32':
# get the lowest x and y coordinate of the monitor setup
monitors = win32api.EnumDisplayMonitors()
x_min = min([mon[2][0] for mon in monitors])
y_min = min([mon[2][1] for mon in monitors])
# add negative offset due to multi monitor
retVal = Box(left=retVal[0] + x_min, top=retVal[1] + y_min, width=retVal[2],height=retVal[3])
don't forget to add the import win32api
In file: pyscreeze/__init__.py:
if sys.platform == 'win32': # TODO - Pillow now supports ImageGrab on macOS.
import win32api # used for multi-monitor fix
from PIL import ImageGrab
So I'm trying to check if there is a notification dot on a fixed position of the screen in a website.
I found another question here which tried to find black and white dots on the screen and I tried it, however it FINDS the dots when it should not, so I don't know what I'm doing wrong, I tried switching the values around because of the BGR/RGB thing in cv2, but that should not be an issue with numpy? idk, please help.
Here is my code:
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
import pyautogui as pya
def test():
screen = pya.screenshot(region=(550, 90, 30, 50))
# screen.show()
img = np.array(screen)
x, y, z = np.where(img==(173,95,255))
points = zip(x,y)
if points:
return True
return False
Found a good solution.
take a screenshot of the icon which has the small dot you want to find.
Here is my printscreen:
Then, you will put that screenshot in the same folder as your python code and name it notification.png
Then, run this code with the website opened on your screen (with this code, we will find the coordinates of the box that contains the print-screen you just took on the icon + notification dot):
import pyautogui
import time
location = pyautogui.locateOnScreen('notification.png')
In my case I received the following output(coordinates):
Box(left=1474, top=109, width=52, height=52)
Then, We will do the following. Check if this notification image appear at the same place of the screen, so we just do:
import pyautogui
import time
r = None
while r is None:
location = pyautogui.locateOnScreen('notification.png')
if str(location) == "Box(left=1474, top=109, width=52, height=52)":
print("found Image!")
print("Not found Image!")
except Exception as e:
r = None
If it finds the image on the screen, It will say found Image, If not, Not found.
It may take a while to render if you are using 2 screens.
So, you can put a timer to break the script in case you think it's taking too long in the not finding case, or you can just restrict the search area.
I'm trying so hard and pushing myself to my limits, but I just can't figure out how to resize a terminal to my desire. Is there any way that someone can help me solve it? I would like the terminal to be solved with its unique code in different operating systems or you could just try to solve it in one or more lines of code.
# -*- coding: utf-8 -*-
### Requirements for default python
from __future__ import absolute_import
from __future__ import print_function
from __future__ import generators
### Available for all python sources
from sys import platform
from os import system
class MainModule(object):
def __init__(self, terminal_name, terminal_x, terminal_y):
self.terminal_name = terminal_name
self.terminal_x = terminal_x
self.terminal_y = terminal_y
if platform == "linux" or platform == "linux2":
# Code to resize a terminal for linux distros only
if platform == "win32" or platform == "win64":
# Code to resize a terminal for windows only
if platform == "darwin":
# Code to resize a terminal for mac only
As you seem to have discovered, the implementation is platform-specific. You'll have to write code to do this for each platform.
On Windows, there are Windows APIs that can be used to do this. You can leverage Windows APIs directly using the ctypes module. One example of this can be seen in the PyGetWindow package. Other tools like AutoHotkey (via ahk Python package), and PyWinAuto are alternative tools to do this for Windows.
# example using the AHK package on Windows
from ahk import AHK
ahk = AHK()
win = ahk.find_window(title=b'Untitled - Notepad')
win.move(x=200, y=300, width=500, height=800)
On MacOS, you can write an apple script to resize the window and launch osascript from a subprocess.
# Using applescript on MacOS
import subprocess
X = 300
Y = 30
WIDTH = 1200
HEIGHT = 900
tell application "{APPLICATION_NAME}"
set bounds of front window to {X}, {Y}, {WIDTH}, {HEIGHT}
end tell
subprocess.run(['osascript', '-e', APPLESCRIPT], capture_output=True)
For Linux, as Jeff mentions in the comments, the Linux implementation will depend on the window manager used of which there are many. But for popular platforms like Ubuntu, you may rely on existing tools like the wmctrl package or similar packages.
# ref: https://askubuntu.com/a/94866
import subprocess
WINDOW_TITLE = "Terminal" # or substring of the window you want to resize
x = 0
y = 0
width = 100
height = 100
subprocess.run(["wmctrl", "-r", WINDOW_TITLE, "-e", f"0,{x},{y},{width},{height}"])
Though, if you are writing a game or similar, you can get around this a different way. For example, pygame lets you set your window size or in text-based terminal applications, curses (or a popular wrapper for curses, blessings) can be used to detect terminal size and you can resize your application dynamically, which may take some changing of your current code to do.
height = curses.LINES
width = curses.COLS
redraw(width, height) # you implement this to change how your app writes to the terminal
The problem
Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it. A perfect example of this is a video by Micheal Reeves
where whenever he loses health in a game it shows it and sends it to his Bluetooth enabled airsoft gun to shoot him. So far I have this:
# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))
x = 760
y = 968
ox = 50
oy = 22
# screen capture
img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
img_np = np.array(img)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
cv2.imshow("Screen", frame)
if cv2.waitKey(1) == 0:
it captures real-time and displays it in a window but I have no clue how to make it recognise the text every frame and output it.
any help?
It's fairly simple to grab the screen and pass it to tesseract for OCRing.
The PIL (pillow) library can grab the frames easily on MacOS and Windows. However, this feature has only recently been added for Linux, so the code below works around it not existing. (I'm on Ubuntu 19.10 and my Pillow does not support it).
Essentially the user starts the program with screen-region rectangle co-ordinates. The main loop continually grabs this area of the screen, feeding it to Tesseract. If Tesseract finds any non-whitespace text in that image, it is written to stdout.
Note that this is not a proper Real Time system. There is no guarantee of timeliness, each frame takes as long as it takes. Your machine might get 60 FPS or it might get 6. This will also be greatly influenced by the size of the rectangle your ask it to monitor.
#! /usr/bin/env python3
import sys
import pytesseract
from PIL import Image
# Import ImageGrab if possible, might fail on Linux
from PIL import ImageGrab
use_grab = True
except Exception as ex:
# Some older versions of pillow don't support ImageGrab on Linux
# In which case we will use XLib
if ( sys.platform == 'linux' ):
from Xlib import display, X
use_grab = False
raise ex
def screenGrab( rect ):
""" Given a rectangle, return a PIL Image of that part of the screen.
Handles a Linux installation with and older Pillow by falling-back
to using XLib """
global use_grab
x, y, width, height = rect
if ( use_grab ):
image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
# ImageGrab can be missing under Linux
dsp = display.Display()
root = dsp.screen().root
raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
# DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
return image
### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
EXE = sys.argv[0]
del( sys.argv[0] )
# EDIT: catch zero-args
if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ): # some minor help
sys.stderr.write( EXE + ": monitors section of screen for text\n" )
sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
sys.exit( 1 )
# TODO - add error checking
x = int( sys.argv[0] )
y = int( sys.argv[1] )
width = int( sys.argv[2] )
height = int( sys.argv[3] )
# Area of screen to monitor
screen_rect = [ x, y, width, height ]
print( EXE + ": watching " + str( screen_rect ) )
### Loop forever, monitoring the user-specified rectangle of the screen
while ( True ):
image = screenGrab( screen_rect ) # Grab the area of the screen
text = pytesseract.image_to_string( image ) # OCR the image
# IF the OCR found anything, write it to stdout.
text = text.strip()
if ( len( text ) > 0 ):
print( text )
This answer was cobbled together from various other answers on SO.
If you use this answer for anything regularly, it would be worth adding a rate-limiter to save some CPU. It could probably sleep for half a second every loop.
Tesseract is a single-use command-line application using files for input and output, meaning every OCR call creates a new process and initializes a new Tesseract engine, which includes reading multi-megabyte data files from disk. Its suitability as a real-time OCR engine will depend on the exact use case—more pixels requires more time—and which parameters are provided to tune the OCR engine. Some experimentation may ultimately be required to tune the engine to the exact scenario, but also expect the time required to OCR for a frame may exceed the frame time and a reduction in the frequency of OCR execution may be required, i.e. performing OCR at 10-20 FPS rather than 60+ FPS the game may be running at.
In my experience, a reasonably complex document in a 2200x1700px image can take anywhere from 0.5s to 2s using the english fast model with 4 cores (the default) on an aging CPU, however this "complex document" represents the worst-case scenario and makes no assumptions on the structure of the text being recognized. For many scenarios, such as extracting data from a game screen, assumptions can be made to implement a few optimizations and speed up OCR:
Reduce the size of the input image. When extracting specific information from the screen, crop the grabbed screen image as much as possible to only that information. If you're trying to extract a value like health, crop the image around just the health value.
Use the "fast" trained models to improve speed at the cost of accuracy. You can use the -l option to specify different models and the --testdata-dir option to specify the directory containing your model files. You can download multiple models and rename the files to "eng_fast.traineddata", "eng_best.traineddata", etc.
Use the --psm parameter to prevent page segmentation not required for your scenario. --psm 7 may be the best option for singular pieces of information, but play around with different values and find which works best.
Restrict the allowed character set if you know which characters will be used, such as if you're only looking for numerics, by changing the whitelist configuration value: -c tessedit_char_whitelist='1234567890'.
pytesseract is the best way to get started with implementing Tesseract, and the library can handle image input directly (although it saves the image to a file before passing to Tesseract) and pass the resulting text back using image_to_string(...).
import pytesseract
# Capture frame...
# If the frame requires cropping:
frame = frame[y:y + h, x:x + w]
# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")
# Process the result
health = int(text)
Alright, I was having the same issue as you so I did some research into it and I'm sure that I found the solution! First, you will need these libraries:
To install cv2, simply use this in a command line/command prompt: pip install opencv-python
Installing pytesseract is a little bit harder as you also need to pre-install Tesseract which is the program that actually does the ocr reading. First, follow this tutorial on how to install Tesseract. After that, in a command line/command prompt just use the command: pip install pytesseract
If you don't install this right you will get an error using the ocr
To install Pillow use the following command in a command-line/command prompt: python -m pip install --upgrade Pillow or python3 -m pip install --upgrade Pillow. The one that uses python works for me
To install NumPy, use the following command in a command-line/command prompt: pip install numpy. Thought it's usually already installed in most python libraries.
This code was made by me and as of right now it works how I want it to and similar to the effect that Michal had. It will take the top left of your screen, take a recorded image of it and show a window display of the image it's currently using OCR to read. Then in the console, it is printing out the text that it read on the screen.
# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract
# We only need the ImageGrab class from PIL
from PIL import ImageGrab
# Run forever unless you press Esc
while True:
# This instance will generate an image from
# the point of (115, 143) and (569, 283) in format of (x, y)
cap = ImageGrab.grab(bbox=(115, 143, 569, 283))
# For us to use cv2.imshow we need to convert the image into a numpy array
cap_arr = np.array(cap)
# This isn't really needed for getting the text from a window but
# It will show the image that it is reading it from
# cv2.imshow() shows a window display and it is using the image that we got
# use array as input to image
cv2.imshow("", cap_arr)
# Read the image that was grabbed from ImageGrab.grab using pytesseract.image_to_string
# This is the main thing that will collect the text information from that specific area of the window
text = pytesseract.image_to_string(cap)
# This just removes spaces from the beginning and ends of text
# and makes the the it reads more clean
text = text.strip()
# If any text was translated from the image, print it
if len(text) > 0:
# This line will break the while loop when you press Esc
if cv2.waitKey(1) == 27:
# This will make sure all windows created from cv2 is destroyed
I hope this helped you with what you were looking for, it sure did help me!
Besides looking for answers on this site, I checked out
pywinauto.application module
Getting Started Guide
but I'm still stumped.
I manually start notepad and want the first while block of the following code to make the notepad window visible. The second while block works but I am confused about the line
dlg_spec = app.UntitledNotepad
What is going on here? What kind of a python method is this?
Question: How do I get the first while block of code make the window titled
Untitled - Notepad
# Desc: Set focus on a window
# #--------*---------*---------*---------*---------*---------*---------*---------*
import sys
import pywinauto
# # Manually started Notepad
# # Want to make it visible (windows focus)
# # Program runs, but...
while 1:
handle = pywinauto.findwindows.find_windows(title='Untitled - Notepad')[0]
app = pywinauto.application.Application()
ac = app.connect(handle=handle)
topWin = ac.top_window_()
# # Working Sample Code
while 0:
app = pywinauto.Application().start('notepad.exe')
# describe the window inside Notepad.exe process
# # ?1: '.UntitledNotepad' - huh?
dlg_spec = app.UntitledNotepad
# wait till the window is really open
actionable_dlg = dlg_spec.wait('visible')
For convenience this code does the trick:
# # Manually started Notepad
# # Want to make it visible (windows focus).
# #
# # Two or three lines solution provided by
# # Vasily Ryabov's overflow answer
# # (wrapper ribbon and bow stuff).
while 1:
app = pywinauto.application.Application().connect(title="Untitled - Notepad")
dlg_spec = app.window(best_match="UntitledNotepad")
I would suggest you using the win32gui library for this task as shown below:
import win32gui
hwnd = win32gui.FindWindow(None, 'Notepad')
win32gui.ShowWindow(hwnd, 9)
The number 9 represents SW_RESTORE as shown here
Well, the first while loop should be re-written using the same methods except find_windows (it's low level and not recommended for direct usage). You need method .set_focus() to bring the window to foreground.
app = pywinauto.Application().connect(title="Untitled - Notepad")
Creating window specification dlg_spec = app.UntitledNotepad means that app method __getattribute__ is called. Finally this line is equivalent to dlg_spec = app.window(best_match="UntitledNotepad"). To find the actual wrapper you need to call .wait(...) or .wrapper_object().
But when you call an action (like .set_focus()), Python can do the wrapper_object() call for you implicitly (while accessing attribute set_focus dynamically).
I'm trying to take a screen shot of an applet running inside a
browser. The applet is using JOGL (OpenGL for Java) to display 3D
models. (1) The screen shots always come out either black or white.The
current solution uses the usual GDI calls. Screen shots of applets not
running OpenGL are fine.
A few examples of JOGL apps can be found here https://jogl-demos.dev.java.net/
(2) Another thing I'm trying to achieve is to get the scrollable area
inside the screen shot as well.
I found this code on the internet which works fine except for the 2
issues mentioned above.
import win32gui as wg
import win32ui as wu
import win32con
def copyBitMap(hWnd, fname):
cWnd = wu.CreateWindowFromHandle(hWnd)
rect = cWnd.GetClientRect()
(x,y) = (rect[2] - rect[0], rect[3] - rect[1])
hsrccDc = wg.GetDC(hWnd)
hdestcDc = wg.CreateCompatibleDC(hsrccDc)
hdestcBm = wg.CreateCompatibleBitmap(hsrccDc, x, y)
wg.SelectObject(hdestcDc, hdestcBm.handle)
wg.BitBlt(hdestcDc, 0, 0, x, y, hsrccDc, rect[0], rect[1], win32con.SRCCOPY)
destcDc = wu.CreateDCFromHandle(hdestcDc)
bmp = wu.CreateBitmapFromHandle(hdestcBm.handle)
bmp.SaveBitmapFile(destcDc, fname)
Unless you are trying to automate it, I would just use a Firefox extension for this. There are a number of them returned from a search for "screenshot" that can take a screenshot of the entire browser page including the scrollable area:
Snapper (for older Firefox versions)
However, I apologize, I don't know enough about Python to debug your specific issue if you are indeed trying to do it programmatically.
Here is one way to do it by disabling dwm (Desktop Window Manager) composition before taking the screen shot, but this causes the whole screen to blink whenever its enabled/disabled.
from ctypes import WinDLL
from time import sleep
import win32gui as wg
import win32ui as wu
import win32con
def copyBitMap(hWnd, fname):
dwm = WinDLL("dwmapi.dll")
# Give the window sometime to redraw itself
cWnd = wu.CreateWindowFromHandle(hWnd)
rect = cWnd.GetClientRect()
(x,y) = (rect[2] - rect[0], rect[3] - rect[1])
hsrccDc = wg.GetDC(hWnd)
hdestcDc = wg.CreateCompatibleDC(hsrccDc)
hdestcBm = wg.CreateCompatibleBitmap(hsrccDc, x, y)
wg.SelectObject(hdestcDc, hdestcBm.handle)
wg.BitBlt(hdestcDc, 0, 0, x, y, hsrccDc, rect[0], rect[1], win32con.SRCCOPY)
destcDc = wu.CreateDCFromHandle(hdestcDc)
bmp = wu.CreateBitmapFromHandle(hdestcBm.handle)
bmp.SaveBitmapFile(destcDc, fname)
Grabbing an OpenGL window may be quite hard in some cases, since the OpenGL is being rendered by the GPU directly into its frame buffer. The same applies to DirectX windows and Video overlay windows.
Why not using the Screenshot class of JOGL??
com.jogamp.opengl.util.awt.Screenshot in JOGL 2.0 beta