Detect Areas of Text in Screenshot - python

I'm working on a project to increase the ability for wine to automatically test software packages. What I'm looking to do now is detect text in the screen capture of the current window. I can then parse all of the text and use autohotkey to give a mouse click on the coordinates of the text I want.
For example, in firefox, I might want to test different things, the first open being opening preferences. I would then need to parse the screenshot of firefox, detect all of the separate locations of text. I can then run these separate images of text into tesseract-ocr and detect which one, says "Edit". I then redo this again for "preferences".
I've tried to find a solution but so far can't find anything. I'd prefer a solution that uses python or has python binds as thats what I've been programing in so far.

A possible starting point is Project SIKULI. It is a tool to automate GUI testing. It is written in Java, nonetheless it includes a scripting environment based on Jython, hence modifying it to support python script may be not too difficult.

Related

Run a script in Maya by right clicking UI?

I'm fairly new to python, and I am trying to run the code I wrote when I right click the playback range windows in the timeline of maya (where you type your min or max range) . I managed to find ways to run scripts within the attribute editor / timeline / shelf items, but I cant seem to figure out how to interact with Maya's UI... Even typing a new value in there doesn't update the script editor, so I don't really have a lead on where to go. Any help would be fantastic!
you basically want to add a menu item in the timeslider ?. You can get all maya widget using OpenMayaUI.MQtUtil and here is a full repo which modify timeline control https://github.com/robertjoosten/maya-timeline-marker
Everything Maya does when using the UI logs data in the script editor.
Many items in the log are suppressed though as it would spam the log.
To enable it, activate Echo all commands (can be found in script editor under History), then clear the script editor and perform the action you want logged.
You will find that Maya executes a lot of mel script functions, which you can all find in the maya install directory under [MAYA_INSTALL_DIR]/scripts/startup and [MAYA_INSTALL_DIR]/scripts/others
To find the essential function you will have to search a little through the log.
Once you got a mel function you want to look for, I recommend using a find in files search function like the one in Notepad++, to find the corresponding .mel file in which the global proc is defined.
Also note that UI element names that are logged in script editor may have a different name the next time you start maya. So you will have to add functionality to search for the correct UI element name.
The Maya mel scripts are a good resource to find out all sorts of things about how the UI works, including contextual marking menus and the commands of tool windows.

interacting with a user created gui dynamically using python

There is a user created GUI which is an .exe that loads a DLL. This GUI has a bunch of sliders, check boxes etc. I would like to move two sliders at the same time using python automatically without me using my mouse to move the sliders. I would like to know the python modules that an be used to achieve this purpose.
You can try using SikuliX.
It can be used to automate mouse or keyboard actions based on pattern matching. It is originally developed for Java I think, but it can be used with Python.
Basically, anything you can do yourself manually can be done with it.
The developer seems to be pretty active, so you can easily seek help from him if you have issues.

How to read/edit a GUI/MFC application in Python?

I want to automate one of my tasks, by changing a third-party GUI/MFC application's properties as per my requirements. Every time I need to carry out any testing, I need to change the properties of the application to test my software.
I tried to automate it by using Python and IronPython. After Googling a lot I found IronPython, because the GUI is written C# and VB.NET.
Suppose when opening the GUI in its editor it gives me the option to edit the properties, MFC contains lots of controls.. e.g.:
Enter Time |__| //Need to enter the value in the box
Enter the dealy |__| //Need to enter the value in the box
Want to display |_| //Check box , check or uncheck
some Radio buttons.
Some more controls.
....
....
I want to control all the changes from my Python script. I will just enter the value from my script and it will update them in the GUI.
I wrote a script in IronPython to read the GUI:
fw = open("MyFile.vnb", 'r')
for line in fw.readlines():
print (line)
I found plenty of encrypted/encoded characters along with some of the C#/VB.NET codes in the console. So, I am completely stuck here.
I would like to know can we edit a third-party GUI with Python/IronPython or not? Do I need to use some special tools from Python to edit the GUI?
If you need classic GUI automation you can control MFC application by pywinauto library. You can send keyboard events, mouse clicks, moves etc. pywinauto has also limited support for .NET controls (simple automation like buttons, text boxes etc. is available). I guess this is realistic task (see sample video by the link above).
But if you need some binary instrumentation to change the GUI executable permanently (it sounds strange), this is completely another topic. Read about PIN tool. It's used for profilers development, for example. Collecting stack samples, unwinding call stacks and other tricky reverse engineering things. :)

Python Script (main) + Blender "face" animation

SO I am infact doing something very similar to this user posts:
https://stackoverflow.com/questions/6800292/python-ai-and-3d-animation
but it has no answers and I couldn't contact the user.
Basically I have a functioning python script that answers me with an action accordingly to my voice command. (Fetch emails, weather forecast, turn lights ON/OFF, etc), it has been made using the pyspeech library which is pretty darn good.
Now I want to give my programm a "face"! I thought about modelling the face with Blender (have some knowledge and would build up on it) and I know I could animate it, so the lips move and such.
So I want to know if it is at all possible to:
Load the "face" that I made from blender from my main python script (so when my programm start the face would be there on the screen too)
Run from the script the animations such that when for example when my programm says "You're welcome" I would run the animation that the lips move on the face to simulate it is speaking.
I know that blender has a good python integration (maybe correct is to say it is built on?) and that is why I thought it would be a good program to use.
Hope someone can help and tell me if that is at all possible and maybe show me some right way to go, my googling just showed me always python scripting with Blender which is not what I exactly need here... I think...
Cheers,
Flavio
Indeed, what you want is possible.
If all you want is to play pre-rendered animation videos based on decisions on your program, any GUI that allows you to embedd and play video in a widget will do for your application.
You could rool out your own GUI using Pygame (which has video support, but you will need one of the "minor" more or less "amateur" widget toolkits made for pygame to make up the remaining of your application, as pygame is pretty low level.
On a higher level, although I'had not embedded video, I think you could go with PyQT4 (googled a bit, not that many examples either, buthints that there are eamples in QT4 source) or GTK+ (the samething, it looks like there are more examples).
Another option would be to build your application to run inside the Blener Game Engine itself - It offers both a high level Toolkit, and ways to customize behaviors to user actions (even without coding).
The major drawback in doing this is: I don't know which are the options to distribute an application that needs Blender Game Engine nowadays - your users will need to install Blender (but it is likely Blender folks made an easy way to jhandle this).
On the upper hand: you get the most flexibility, it would even be possible to render some sequences in realtime (as opposed to pre-rendered videos) in your app.
One thing: Blender nowadays use Python 3.x - if the other libraries you need are Python 2, you willl need to make one different process for the GUI inside Blender, and exchange data with your application's backeend in Python 2 (for example using jsonrpc or xmlrpc - that is enoguh simple in Python).

Python widget/cursor detection?

Beginner python learner here. I have a question that I have tried to Google but I just can't come up with the proper way to ask in just a few words (partly because I don't know the right terminology.)
How do I get python to detect other widgets? For example, if I wanted a script to check and see when I click my mouse if that click put focus on an entry widget on a (for example) website. I've been trying to get it to work in Tkinter and I can't figure out even where to begin.
I've seen this:
focus_displayof(self)
Return the widget which has currently the focus on the
display where this widget is located.
But the return value for that function seems to be some ambiguous long number I can't decipher, plus it only works in its own application.
Any direction would be much appreciated. :)
Do you mean inside your own GUI code, or some other application's/website's?
Sounds like you're looking for a GUI driver, or GUI test/automation driver. There are tons of these, some great, some awful, many abandoned. If you tell us more about what you want that will help narrow down the choices.
Is this for testing, or automation, or are you going to drive the mouse and button yourself and just want something to observe what is going on under the hood in the GUI?
>How do I get Python to detect other widgets?
On a machine, or in a browser? If in a machine, which platform: Linux/Windows (which)/Mac?
If in a browser, which browser (and major version)?
> But the return value for that function seems to be some ambiguous long number I can't decipher
Using longs as resource handles is par for the course, although good GUI drivers also work with string/regex matching on window and button names.
> plus it only works in its own application.
What do you mean, and what are you expecting it to return you? You should be able to look up that GUI object and access its title. Look for a GUI driver that works with window and button names.
Here is one list, read it through and see what sounds useful. I have used AutoIt under Win32, it's great, widely-used and actively-maintained; it can be called from Python (via subprocess).
Here are comparisons by the author of PyWinAuto on his and similar tools. Give a read to his criticisms of its structure from 2010. If none of these is what you want, at least you now have the vocabulary to tell us what would be...

Categories

Resources