better way to automate mouse&keyboard using pyautogui - python

I wrote a script using pyautogui that should start an program (an IDE) and then start using it.
This is the script so far:
#! python3
# mouseNow.py - Displays the mouse cursor's current position.
import pyautogui, sys, subprocess
from time import sleep
x,y = 1100,550
subprocess.call([r'C:\...exe', arg1, arg2])
pyautogui.click(x,y)
sleep(5) # 2 sec should suffice but this is for safety
pyautogui.typewrite(my_string)
pyautogui.press('enter')
This works well but I want to be portable. The x,y values were determined by where the program prompt appears on screen after I start the program, but this is not portable, I think. Is there a way to point the mouse to the prompt without giving const parameters? something like move_mouse_to_window_of_this_process_after_starting_it()
Also, I use sleep() so I would write the data to the window after it appears, but I guess it's not a good way (some PC will run this much slower, I guess), so is there a way to know when the prompt appeared and then do the pyautogui.typewrite(my_string)?
EDIT: I found a simple solution for the move_mouse_to_window_of_this_process_after_starting_it()
:
>>> pyautogui.hotkey('alt', 'tab')

If you need portable and reliable solution, you have to find a library that supports accessibility technologies to access GUI elements by text. Basic technologies are:
Win32 API, MS UI Automation (Windows)
AT-SPI (Linux)
Apple Accessibility API (MacOS)
There are several open-source GUI automation libraries supporting some of these technologies (usually 1 or 2). Python solutions:
pywinauto on Windows (both Win32 API & MS UIA, see Getting Started Guide)
pyatspi2 on Linux
pyatom on MacOS
There is also a thread on StackOverflow regarding hard sleeps vs flexible waiting.
Enjoy! :)

The way you are interacting with the .exe excludes alternatives to coordinates or blind firing (Tab, Tab, Enter etc..).
If the application has an API, you could interact with it programatically.
If it doesn't you can only try to match the location for x screen resolutions, and this only if the GUI is used in Fullscreen/windowed Fullscreen.

Related

Send keystrokes to non-active GUI application without occupying the keyboard

As the title explain, i'm trying to use the terminal to send commands as keystrokes to a GUI application that's minimized.
There is a lot of similar questions here on Stack with some great answers, but i'm having, mainly, three problems with the solutions i saw: Most of the solutions need the automated application to be the active one. Or, i can't normally use my keyboard while the script/process is running. Or worse, the solution works only on Windows OS.
I need what this person asked 2 months ago: Send keystrokes to a specific window (in background), but do something else in the meantime
But i want it on Linux.
I'm using Kubuntu 18.10, if that helps.
xdotool was close, but i couldn't quite get it to send the commands to a specific window or PID. It also uses "my keyboard", so i can't, for example, write an essay/code/browse online while xdotool is running. Pexpect also have this last problem.
AutoHotKey looks like it would work, but it's only for Windows and i'm trying to not use Wine. Same with pywin32.
keyboard (https://github.com/boppreh/keyboard) seems nice, but it can't send a command to a specific application. Same with PyAutoGUI.
I selected the Python tag because most of the solutions i saw use Python, but i'm open to any language.
Use a nested X server to input keystrokes without changing focus or keyboard grab.
Proof of concept:
Xephyr -resizeable :13
export DISPLAY=:13
xterm
xdotool type rhabarber
The Xephyr nested X server is started and will listen on local X socket 13 (whereas :0 typically identifies the currently running X server, but when multiple sessions are ran concurrently, it could be higher).
Then we set DISPLAY environment variable to :13, so any X application we start will connect to Xephyr; xterm is our target application here. Using xdotool or any other tool we can send keystrokes.
As the target X server is identified through $DISPLAY, applications can be started or input events triggered from elsewhere as well. If needed, you might also run a lightweight window manager within Xephyr, e.g. to 'maximize' the application so that it fills the whole Xephyr window.

Capture and consume input events from background python process without notifying focused window

Basically I would like to write small script that would allow me to have some sort of programmable keyboard emulation. Something similar to how autohotkey on Windows is able to work.
Lets say I would like to rebind arrow keys to 'wsad' or 'hjkl' but only when CapsLook is active. I was able to detect keyboard key press with pyinput(https://pypi.python.org/pypi/pynput ) I also can send easily various keyboard events to focused window with pyautogui (https://pyautogui.readthedocs.io) But I can't figure out a way to consume events before they are received by currently focused window.
Any hints?
THIS module is one of the available tools for capturing keyboard events:
https://pypi.python.org/pypi/keyboard/
but it is still in the development and doesn't (yet) provide a global hook capable of capturing keyboard events at their very origin and forwarding them (or not) to the target application.
Another tool worth to look into is:
myboard.py at code.google.com downloads
The above script is using Python ctypes and Xlib modules which makes it possible to work directly with the system libraries written in C. It catches the keyboard events quite deep and system wide to a degree that it had crashed my OS while testing it a bit, so be warned ...
Consider also using XGrabKey and XGrabKeyboard from the X11 libX11.so system library:
import ctypes
libX11 = ctypes.CDLL('libX11.so')
XGrabKey = libX11.XGrabKey
XGrabKeyboard = libX11.XGrabKeyboard
print("XGrabKey: " , dir(XGrabKey))
print("XGrabKeyboard: ", dir(XGrabKeyboard))

Bring terminal to the front in Python

Is there a way to bring the Linux terminal to the front of your screen from a python script? Possibly using some kind of os. command
I.e - Your python script opens up a GUI that fills the screen, but if a certain event happens that you want to see printed in the terminal to be viewed, but don't want to / can't show this information on the GUI (so please don't suggest that)
And if possible, hide it back behind your other windows again, if needed.
(Python 2, by the way)
Any suggestions greatly appreciated.
Not in any generally supported way.
Some terminal applications may support the following control sequences. However, these sequences are not standardized, and most terminals do not implement them.
\e[5t - move window to front
\e[6t - move window to back
\e[2t - minimize ("iconify") window
\e[1t - un-minimize window
— from http://rtfm.etla.org/xterm/ctlseq.html
That "bring the Linux terminal to the front of your screen" is likely talking about terminal emulators running in an X Window environment. Ultimately this is accomplished by making a request to the window manager. There is more than one way to do this.
xterm (and some other terminal emulators) implement the Sun control sequences for window manipulation (from the 1980s) which were reimplemented in dtterm during the early 1990s. xterm has done this since 1996 (patch #18).
Python: Xlib — How can I raise(bring to top) windows? mentions wmctl, a command-line tool which allows you to make various requests to the window manager.
xdotool is another command-line tool which performs similar requests.
finally, Can a WM raise or lower windows? points out that you can write your own application (and because python can use shared libraries written in C, you could write a script using the X library).

How do I control a non-browser window that is part of Firefox?

I'm on OSX using Python 2.x, Selenium & Firefox
I'm automating testing a javascript webapp with Python & Selenium.
One of the links (Add File) in the application opens up a non-browser firefox window titled "File Upload" which looks like (/is?) a Finder window.
Is there a way that I could locate and control this window from my python script? I know Selenium can't do it, but I wondering if it might be possible with something like 'import applescript' and if so how?
I found atomac which allows me to control mac apps through their accessibility controls (which needed to be enabled on Mavericks for Aptana in System Preferences -> Security & Privacy -> Privacy -> Accessibility). Cool tool, but the documentation is pretty sparse. The examples provided on the page above got me to the point where I could close the window via the cancel button, but I had to review the function definitions in atomac's AXClasses.py to figure out the rest. Here's the solution.
import atomac, time
from atomac.AXKeyCodeConstants import *
# to allow me to make firefox frontmost while testing
time.sleep(5)
# get a reference to the running app
firefox = atomac.getAppRefByLocalizedName('Firefox')
# get the window of the reference
firefoxwindow = firefox.windowsR()[0]
# send key sequence to go to my home folder
firefoxwindow.sendKeyWithModifiers('h',[COMMAND,SHIFT])
# send key sequence to select first file there
firefoxwindow.sendKeyWithModifiers('a',[COMMAND])
# press the now active Open button
openbutton = firefoxwindow.buttons('Open')[0]
openbutton.Press()
It's theoretically possible, but really awkward. I'll give you a bunch of links--not ideal, I know, but you could write a book on this.
You'd need to start by enabling AppleScript control of the GUI. Then you'll want to read up on how to control the GUI from within Applescript. However, you wanted to use Python and not AppleScript, so then you'll need to install PyObjC, which is a Python to Cocoa bridge. You'd need to use the Scripting Bridge framework and figure out (from the extremely thin documentation) how to translate the AppleScript docs to Python.

Python 3.x Interaction with other Program GUIs

I'm looking for a Python 3.x library that is able to allow interaction with other programs.
For example, I already have some sort of command-line interface which I have developed in python, and I
want to be able to enter, say "1", and have another program open. From here, I wish to hit another
input like "2" and have it manipulate the GUI that opens (for example, for it to "click" the Configurations
dropdown bar and select an option, perhaps modify a few settings, apply, and then possibly also automatically
enter some text). The reason I'm doing this is for test automation.
I've already tried using pywinauto, but I've found it to not be compatible for Python 3! :(
Is there another possible approach to this? Thanks in advance!!!
P.S. I may have forgotten to mention that I'm using Windows 7 but with Python32
You could look into sikuli. It lets you automate clicks and other actions based on region or matched graphic. Fairly smart. Is there a reason you're dead set on using py3?
Py3-compatible pywinauto released! New home page: http://pywinauto.github.io/
P.S. I'm maintainer of pywinauto.
Late answer, but have a look at pyautogui which enables you to move the mouse and press keys. I used it for the following snippet which launches an emulator and presses keys.
import pyautogui as pg
import os
import time
game_filepath = "../games/BalloonFight.zip"
os.system(f"fceux {game_filepath} &")
time.sleep(1)
keys_to_press = ['s', 's', 'enter']
for key_to_press in keys_to_press:
pg.keyDown(key_to_press)
pg.keyUp(key_to_press)
time.sleep(2)
im = pg.screenshot("./test.png", region=(0,0, 300, 400))
print(im)
A more detailed expalanation can be found here: Reinforcement learning to play Nintendo NES games
I created a pywinauto fork on GitHub that's compatible with Python 3:
https://github.com/Usonaki/sendkeys-py-si-python3
I only did basic testing, so there might still be some circular import related problems that I haven't found.

Categories

Resources