We are writing a very simple code for a game which automates an enhancing process (it's our own server so it's just for the fun of it) during this process you occasionally get a captcha which you have to solve in order to continue enhancing. We are stuck up on how we could solve the captchas and this is where we need your help. The code is written in python and is very simple. The captcha is also very simple it's only 3 numbers. (can't be anything else other than numbers from 0-9) Here is how the captcha window looks like: [https://i.stack.imgur.com/27mAK.png]
The code looks like this:
import pyautogui
import time
import keyboard
while True:
opt = pyautogui.locateOnScreen('asd.png', confidence=.95) #looks for a good enchant
forgat = pyautogui.locateOnScreen('forgatas.png') #locates the button to press for enhancing
stop = keyboard.is_pressed("shift") #stops the loop with shift
if opt:
print('Done')
break
if stop:
print('Stopped')
break
else:
pyautogui.click(forgat)
time.sleep(0.2)
Did some testing with pytesseract:
from cv2 import cv2
import pytesseract
import pyautogui
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
img = cv2.imread('ak.png')
text = pytesseract.image_to_string(img)
print(text)
It successfully converts the img to text but we can't figure out how to copy the text into the text-box in game. Copying the ingame text is not an option.
Would also like to ask you to give suggestions regarding speeding up the process while the locateOnScreen function is still able to keep up (don't want the code to skip over a good enchant for going too fast) and maybe using something else instead of time.sleep because it heavily taxes the system. Sorry if the code is messy we are still very much beginners and we never learned python before. Any help would be greatly appricated! Looking forward to any suggestion!
I suggest you to try this library I've found some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.
Related
Need to mention that im new with Python and i decided to create a bot for multiplayer game, to autobuy items on auctionhouse, using opencv and pyautogui, so far everything was going pretty well, the cursor was heading to the right point on the screen (reload auction), but
pyautogui.click(clicks=1) isnt working in game window.
IDE (PyCharm) is running with admin rights, googled alot about the topic, but nothing works so far. Will be pleased if anyone could help me, this is my first big project i really want to work with, so hopefully you guys can help me :D
additional info: game uses Battle Eye anticheat, engine Java (probably... Game is called Stalcraft, you can find it on steam, looks like its something Minecraft-based, but im not sure about it)
OS: Win 10 x64
Python:3.11
What i tried:
pyautogui library (tried pyautogui.MoveTo(x,y) and the method with pyautogui.locateCenterOnScreen("whatever.png",confidence=0.85 Need to mention that first method works only in IDE, the second one based on img recognition also works with browser. Tried this in other apps, but no results. It's just hovering cursor on the right place, but no clicks at all)
pydirectinput library
Here's what i got so far
import cv2
import random
import pyautogui
from time import sleep
import imutils
import numpy as np
import pydirectinput
pyautogui.FAILSAFE=True
rng=random.uniform(0.87, 1.3)
sleep(5)
pyautogui.size()
print(pyautogui.size())
pyautogui.position()
print(pyautogui.position())
pyautogui.moveTo(x=1344, y=342, duration=rng)
pyautogui.click(1344, 342, clicks=5)```
This might be very silly. But I had no luck with .click. In my sample I used .moveTo as you did, but then .mouseDown() and .mouseUp() to simulate a click. Also sometimes it required a delay between the two. I wonder if that combo would help instead?
pyautogui.mouseDown()
sleep(1)
pyautogui.mouseUp()
I am a beginner at coding. I use python and windows 10
I wrote a very simple code that captures and then opens an image, then loops match template in order to determine what is the object in that image using a list containing all the possible answers. Code uses pyautogui and opencv:
import pyautogui
import cv2 as cv
def my_func():
#train image
pyautogui.screenshot("train.png") #I am looking at the picture of an animal and the robot takes a screenshot and stores it.
train_img = cv.imread("train.png", 0)
#Contains all the images to iterate through
template_list = ["apple.png", "person.png", "animal.png"]
for i in template_list:
#template image
template_img = cv.imread(i,0)
#match template
result = cv.matchTemplate(train_img, template_img, cv.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(result)
if max_val >= .85:
print(i) #prints the name of the matched image
return True
print("could not match the train image to one of the available templates.")
return False
The expected output is just for the console to print:
animal.png
I want to create an application, a window or anything of the sort where you click a button that says "Run" and then the code will run. When done, it will display the console log.
You can do this is VS Code, but while the code is running, I can't see the console log (because I need to go to the image where it will take a screenshot) and I want to be able to see it.
So my questions are:
Is it possible to create a desktop app for windows to do this task?
Will that app work on other computers besides mine?
Do you recommend any other alternatives?
Thanks to #The Laggy Tablet, I did some research and found 100+ youtube video tutorials about Tkinter by John Alder. I am not sure if I can post links, so I won't, but it is very easy to find.
It is actually pretty easy to use and when you finish it and make it into an actual GUI, other people can use it without needing to install any dependencies, or even Python.
Hope this helps someone in the future. Cheers!
Edit:
It might help to know that I'm using python 2.7.9 (that's what was taught in my GIS class).
I've almost got it working I think. Although now it's a new question.
I have this code
from PIL import Image
im = Image.open("C:/users/Chrostopher/Asuna.png")
There are no error messages and my screen flashed black like it wanted to do something, but the picture didn't show/open/display. What should I do?
Thanks for all the help so far. I feel like I'm slowly (and with many mistakes) learning something useful.
Old:
I am very, very new at this, which is why I'm asking. I've looked around for help, but there's always one thing I don't understand and it's just turned into a very deep rabbit hole.
When I've tried the code I've seen here, it doesn't work. Looking further, I need the Python Image Library (PIL). I've downloaded it, but I can't figure out how to set it up to work in Python. The file is a .gz. Is there some place I need to put the file or some way to import it?
If you could answer step by step, that would be wonderful for this extreme newb.
This is the code I have (to try and open an image which is the end goal)
import Image
def main():
filename = "desert.jpg"
image = Image.open(filename)
image.show
del image
if (__name__ == "__main__"):
main()
Is there something I'm missing or not doing right that is messing up what I'm trying to do?
First install Pillow with "pip"
$ pip install Pillow
Then, instead writing
import Image
In the first line, you can use:
from PIL import Image
I would like to display an image with Python and close it after user enters the name of the image in terminal. I use PIL to display image, here is the code:
im = Image.open("image.jpg")
im.show()
My application display this image, but user task is to recognize object on image and write answer in terminal. If answer entered is correct user should get another image. Problem with PIL is that I can't close the image and with research the only solution was to kill the process of image viewer, but this is not really reliable and elegant.
Are there any other libraries for displaying images that have methods like .show() and .close() ?
Just open any image viewer/editor in a separate process and kill it once user has answered your question e.g.
from PIL import Image
import subprocess
p = subprocess.Popen(["display", "/tmp/test.png"])
raw_input("Give a name for image:")
p.kill()
A little late to the party, but (as a disgruntled data scientist who really can't be bothered to learn gui programming for the sake of displaying an image) I can probably speak for several other folks who would like to see an easier solution for this. I figured out a little work around by expanding Anurag's solution:
Make a second python script (let's call it 'imviewer.py'):
from skimage.viewer import ImageViewer
from skimage.io import imread
img = imread('image.png') #path to IMG
view = ImageViewer(img)
view.show()
Then in your main script do as Anurag suggested:
import subprocess
p = subprocess.Popen('python imviewer.py')
#your code
p.kill()
You can make the main script save the image you want to open with 'imviewer.py' temporarily, then overwrite it with the next image etc.
Hope this helps someone with this issue!
Terminal is meant to deal with linear command flow - meaning it asks a question, user answers, and then it can ask a different question. What you are trying to do here is for terminal to do two things, show an image and at the same time ask user a question. To do this you can do two of either things:
Multiprocessing
You can start a new thread/process and make PIL show the image using that thread, and meanwhile in the first thread/process ask a user a question. Then after the user answers, you can close the other thread/process. You can take a look at Python's threading module (link) for more information on how you can do that.
GUI
Instead of making your user interface in terminal, make a simple GUI application using whatever framework you are comfortable. I personally like PyQt4. Qt is very powerful GUI development toolkit and PyQt4 is a wrapper for it. If you make a GUI, then what you are tyring to do is rather trivial.
Not all GUIs are difficult to use.
Here is a single-line solution using PySimpleGUI. Normally I wouldn't write it as a single line, but since it's a one-off, perhaps doesn't need adding to, then it's OK to do.
import PySimpleGUI as sg
sg.Window('My window').Layout([[ sg.Image('PySimpleGUI.png') ]]).Read()
Might be an overkill, but for me the easiest and most robust solution was just to use matplotlib as it properly keeps track of the figures it creates, e.g. :
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
imgplot = plt.imshow(mpimg.imread('animal.png'))
plt.ion()
plt.show()
animal_name = raw_input("What is the name?: ")
plt.close()
I know the OCR question with Python has already been discussed many times.
However I didn't find anything that seems to help me excpt this question
Python Tesseract OCR question.
But it didn't solve my problem.
I need to make a little script to capture the text inside an opened window (of a text editor).
So it should:
Take a screenshot
Find the position of the text editor window and slice the screenshot (dunno if this passage is needed)
Convert it to grayscale and pass it to tesseract
I'm kinda newbie to Python and I dunno if this is feasible.
However thanks in advance for any hint.
Giorgio
This is certainly possible but also generally, unreasonable. There are better ways. Say you are parsing a webpage, you could either grab the HTML text without running it through an OCR or if you want to read the text of an image, you can parse through the HTML with urllib2, select the image and just download the image directly to a file. There are many HTML parser alternatives in Python that you can use, as well. Greyscale is simple with PIL or ImageMagick. From there, you can run it through an OCR or do it within the script with a Python wrapper like python-tesseract.
Alternatively—if you insist on doing a screenshot, something like this would be useful for you. I still hold that there are almost always better ways, but this should get you started if you want to try it out.
import gtk.gdk
w = gtk.gdk.get_default_root_window()
sz = w.get_size()
print "The size of the window is %d x %d" % sz
pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1])
pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1])
if (pb != None):
pb.save("screenshot.png","png")
print "Screenshot saved to screenshot.png."
else:
print "Unable to get the screenshot."
This was taken from Take a screenshot via a python script. [Linux]