Take a screenshot and use OCR on it

Take a screenshot and use OCR on it - python

I know the OCR question with Python has already been discussed many times.
However I didn't find anything that seems to help me excpt this question
Python Tesseract OCR question.
But it didn't solve my problem.
I need to make a little script to capture the text inside an opened window (of a text editor).
So it should:
Take a screenshot
Find the position of the text editor window and slice the screenshot (dunno if this passage is needed)
Convert it to grayscale and pass it to tesseract
I'm kinda newbie to Python and I dunno if this is feasible.
However thanks in advance for any hint.
Giorgio

This is certainly possible but also generally, unreasonable. There are better ways. Say you are parsing a webpage, you could either grab the HTML text without running it through an OCR or if you want to read the text of an image, you can parse through the HTML with urllib2, select the image and just download the image directly to a file. There are many HTML parser alternatives in Python that you can use, as well. Greyscale is simple with PIL or ImageMagick. From there, you can run it through an OCR or do it within the script with a Python wrapper like python-tesseract.
Alternatively—if you insist on doing a screenshot, something like this would be useful for you. I still hold that there are almost always better ways, but this should get you started if you want to try it out.
import gtk.gdk
w = gtk.gdk.get_default_root_window()
sz = w.get_size()
print "The size of the window is %d x %d" % sz
pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1])
pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1])
if (pb != None):
pb.save("screenshot.png","png")
print "Screenshot saved to screenshot.png."
else:
print "Unable to get the screenshot."
This was taken from Take a screenshot via a python script. [Linux]

Related

Stuck with catpcha solving in python

We are writing a very simple code for a game which automates an enhancing process (it's our own server so it's just for the fun of it) during this process you occasionally get a captcha which you have to solve in order to continue enhancing. We are stuck up on how we could solve the captchas and this is where we need your help. The code is written in python and is very simple. The captcha is also very simple it's only 3 numbers. (can't be anything else other than numbers from 0-9) Here is how the captcha window looks like: [https://i.stack.imgur.com/27mAK.png]
The code looks like this:
import pyautogui
import time
import keyboard
while True:
opt = pyautogui.locateOnScreen('asd.png', confidence=.95) #looks for a good enchant
forgat = pyautogui.locateOnScreen('forgatas.png') #locates the button to press for enhancing
stop = keyboard.is_pressed("shift") #stops the loop with shift
if opt:
print('Done')
break
if stop:
print('Stopped')
break
else:
pyautogui.click(forgat)
time.sleep(0.2)
Did some testing with pytesseract:
from cv2 import cv2
import pytesseract
import pyautogui
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
img = cv2.imread('ak.png')
text = pytesseract.image_to_string(img)
print(text)
It successfully converts the img to text but we can't figure out how to copy the text into the text-box in game. Copying the ingame text is not an option.
Would also like to ask you to give suggestions regarding speeding up the process while the locateOnScreen function is still able to keep up (don't want the code to skip over a good enchant for going too fast) and maybe using something else instead of time.sleep because it heavily taxes the system. Sorry if the code is messy we are still very much beginners and we never learned python before. Any help would be greatly appricated! Looking forward to any suggestion!

I suggest you to try this library I've found some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

Is it possible to create a desktop application with python and imported libraries?

I am a beginner at coding. I use python and windows 10
I wrote a very simple code that captures and then opens an image, then loops match template in order to determine what is the object in that image using a list containing all the possible answers. Code uses pyautogui and opencv:
import pyautogui
import cv2 as cv
def my_func():
#train image
pyautogui.screenshot("train.png") #I am looking at the picture of an animal and the robot takes a screenshot and stores it.
train_img = cv.imread("train.png", 0)
#Contains all the images to iterate through
template_list = ["apple.png", "person.png", "animal.png"]
for i in template_list:
#template image
template_img = cv.imread(i,0)
#match template
result = cv.matchTemplate(train_img, template_img, cv.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(result)
if max_val >= .85:
print(i) #prints the name of the matched image
return True
print("could not match the train image to one of the available templates.")
return False
The expected output is just for the console to print:
animal.png
I want to create an application, a window or anything of the sort where you click a button that says "Run" and then the code will run. When done, it will display the console log.
You can do this is VS Code, but while the code is running, I can't see the console log (because I need to go to the image where it will take a screenshot) and I want to be able to see it.
So my questions are:
Is it possible to create a desktop app for windows to do this task?
Will that app work on other computers besides mine?
Do you recommend any other alternatives?

Thanks to #The Laggy Tablet, I did some research and found 100+ youtube video tutorials about Tkinter by John Alder. I am not sure if I can post links, so I won't, but it is very easy to find.
It is actually pretty easy to use and when you finish it and make it into an actual GUI, other people can use it without needing to install any dependencies, or even Python.
Hope this helps someone in the future. Cheers!

adding a quit timer that closes a photo

After many hours of searching online and in my python book I can't seem to find the answer to my question which is what do I add to my code so I can put in a timer that automatically closes the photo? It pulls itself up but then I have to manually close the photo to get back to my main program. Any help would be appreciated.
from PIL import Image
img = Image.open('battleship load screen.png')
img.show()

This is not possible using PIL alone - img.show() is just launching another program, it's intended for debugging really, not for presenting things to the user.
From the docs.
Displays an image. This method is mainly intended for debugging
purposes.
On Unix platforms, this method saves the image to a temporary PPM
file, and calls the xv utility.
On Windows, it saves the image to a temporary BMP file, and uses the
standard BMP display utility to show it.
This method returns None.
If you want to display an image and have control over it, use a graphical toolkit and construct a UI for your purpose. I've linked there to an example using PySide, a set of QT bindings, but of course you could use any toolkit - each will be different.

Python PIL-ImageGrab inaccurate when capturing game pixels

I am trying to capture the pixels of a game to script a bot. I have a simple function:
def printPixel():
while True:
flags, hcursor, (x,y) = win32gui.GetCursorInfo()
print x,y,':',ImageGrab.grab().getpixel((x,y))
This prints the current x,y coords and the RGB value of that pixel. This works as expected on my desktop hovering over various icons and such, but the same function does not work in-game. Any thoughts?
edit: When I save the image to a file and perform this same operation on the saved image, it works perfectly in-game. However, it is way slower. I'd like to operate on the image in memory, and not from a file.

Video games often deal with th graphical system directly for performance reasons, so some of the typical windows apis might not work on them. Try and take a screenshot by pressing the print screen button. If that captures your screen than you can take a screenshot in python and check the image you have captured taking into account the cursor position.
To take a screenshot on windows you can check out this answer to the question Fastest way to take a screenshot with python on windows it uses the win32gui library as you are using.

Send an image to be printed to default printer...python

So, currently I am creating an application that when you press a button on the GUI it needs to send the current image to the printer. It is running on windows. I have looking all around the standard library and for third party applications that will help me do this.
Does anyone know of something that could help me with this problem?
Thanks.

Check this link: http://timgolden.me.uk/python/win32_how_do_i/print.html
And use it to print a bitmap file that you have saved to disk. jpg and png won't work for some reason - probably because they need to be converted to a printer-friendly format.
I don't know much about this stuff, but printing a bitmap with tim golden's code definitely works for me.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Take a screenshot and use OCR on it - python

Related

Stuck with catpcha solving in python

Is it possible to create a desktop application with python and imported libraries?

adding a quit timer that closes a photo

Python PIL-ImageGrab inaccurate when capturing game pixels

Send an image to be printed to default printer...python

Categories

Resources