Automatically controlling an app on the PC using openCV Python

Automatically controlling an app on the PC using openCV Python - python

I am working on a project to control the PC exclusively by voice control and gestures(via webcam). So, with the voice control I open the app(for example, YouTube). Now, without typing anything on the search bar, I want to do it through voice typing (without even touching the keyboard), like if I say "search water videos" the cursor will automatically search the thing for me, and give me the result.
Basically, I want to find a text box on a screen of an app using image processing.
There shall be some predefined keywords like, search(for searching), delete(for deleting anything that is mistyped) Go Back(to go back to the previous window), Exit(for exiting the app).
Can it be done with the help of openCV Python?
Many thanks in advance!

Related

Trying to find way to automate clicking button process on a company application with out images

I have a script that is used to login in to a company-made application and click the right buttons like "continue", or "ok", etc. to perform a certain process. However, I have had to use screenshots of these buttons to click in order to do this using pyautogui. Is there any package or way to automate this process without using images. Maybe it can detect the text of the button and click it. I do not have identifiers for the buttons available and no access to the code/info behind the application. Let me know if you have any ideas. Thanks!

I have a few questions that may be helpful:
Does the layout of the buttons change? If it's always the same you can just program the correct locations and timing and not worry about reading the screen.
If you really have to read the screen, look into optical character recognition (ocr).
Is the application keyboard accessible? If so, using Tab and Enter to activate the buttons is simpler than controlling the mouse. Also, if it was made by superstars you can use find (ctrl-F) to search for the text on the buttons and go to them.
This answer is pretty vague, but I can only be as specific as the question asked.

How to read/edit a GUI/MFC application in Python?

I want to automate one of my tasks, by changing a third-party GUI/MFC application's properties as per my requirements. Every time I need to carry out any testing, I need to change the properties of the application to test my software.
I tried to automate it by using Python and IronPython. After Googling a lot I found IronPython, because the GUI is written C# and VB.NET.
Suppose when opening the GUI in its editor it gives me the option to edit the properties, MFC contains lots of controls.. e.g.:
Enter Time |__| //Need to enter the value in the box
Enter the dealy |__| //Need to enter the value in the box
Want to display |_| //Check box , check or uncheck
some Radio buttons.
Some more controls.
....
....
I want to control all the changes from my Python script. I will just enter the value from my script and it will update them in the GUI.
I wrote a script in IronPython to read the GUI:
fw = open("MyFile.vnb", 'r')
for line in fw.readlines():
print (line)
I found plenty of encrypted/encoded characters along with some of the C#/VB.NET codes in the console. So, I am completely stuck here.
I would like to know can we edit a third-party GUI with Python/IronPython or not? Do I need to use some special tools from Python to edit the GUI?

If you need classic GUI automation you can control MFC application by pywinauto library. You can send keyboard events, mouse clicks, moves etc. pywinauto has also limited support for .NET controls (simple automation like buttons, text boxes etc. is available). I guess this is realistic task (see sample video by the link above).
But if you need some binary instrumentation to change the GUI executable permanently (it sounds strange), this is completely another topic. Read about PIN tool. It's used for profilers development, for example. Collecting stack samples, unwinding call stacks and other tricky reverse engineering things. :)

Python Script (main) + Blender "face" animation

SO I am infact doing something very similar to this user posts:
https://stackoverflow.com/questions/6800292/python-ai-and-3d-animation
but it has no answers and I couldn't contact the user.
Basically I have a functioning python script that answers me with an action accordingly to my voice command. (Fetch emails, weather forecast, turn lights ON/OFF, etc), it has been made using the pyspeech library which is pretty darn good.
Now I want to give my programm a "face"! I thought about modelling the face with Blender (have some knowledge and would build up on it) and I know I could animate it, so the lips move and such.
So I want to know if it is at all possible to:
Load the "face" that I made from blender from my main python script (so when my programm start the face would be there on the screen too)
Run from the script the animations such that when for example when my programm says "You're welcome" I would run the animation that the lips move on the face to simulate it is speaking.
I know that blender has a good python integration (maybe correct is to say it is built on?) and that is why I thought it would be a good program to use.
Hope someone can help and tell me if that is at all possible and maybe show me some right way to go, my googling just showed me always python scripting with Blender which is not what I exactly need here... I think...
Cheers,
Flavio

Indeed, what you want is possible.
If all you want is to play pre-rendered animation videos based on decisions on your program, any GUI that allows you to embedd and play video in a widget will do for your application.
You could rool out your own GUI using Pygame (which has video support, but you will need one of the "minor" more or less "amateur" widget toolkits made for pygame to make up the remaining of your application, as pygame is pretty low level.
On a higher level, although I'had not embedded video, I think you could go with PyQT4 (googled a bit, not that many examples either, buthints that there are eamples in QT4 source) or GTK+ (the samething, it looks like there are more examples).
Another option would be to build your application to run inside the Blener Game Engine itself - It offers both a high level Toolkit, and ways to customize behaviors to user actions (even without coding).
The major drawback in doing this is: I don't know which are the options to distribute an application that needs Blender Game Engine nowadays - your users will need to install Blender (but it is likely Blender folks made an easy way to jhandle this).
On the upper hand: you get the most flexibility, it would even be possible to render some sequences in realtime (as opposed to pre-rendered videos) in your app.
One thing: Blender nowadays use Python 3.x - if the other libraries you need are Python 2, you willl need to make one different process for the GUI inside Blender, and exchange data with your application's backeend in Python 2 (for example using jsonrpc or xmlrpc - that is enoguh simple in Python).

Python widget/cursor detection?

Beginner python learner here. I have a question that I have tried to Google but I just can't come up with the proper way to ask in just a few words (partly because I don't know the right terminology.)
How do I get python to detect other widgets? For example, if I wanted a script to check and see when I click my mouse if that click put focus on an entry widget on a (for example) website. I've been trying to get it to work in Tkinter and I can't figure out even where to begin.
I've seen this:
focus_displayof(self)
Return the widget which has currently the focus on the
display where this widget is located.
But the return value for that function seems to be some ambiguous long number I can't decipher, plus it only works in its own application.
Any direction would be much appreciated. :)

Do you mean inside your own GUI code, or some other application's/website's?
Sounds like you're looking for a GUI driver, or GUI test/automation driver. There are tons of these, some great, some awful, many abandoned. If you tell us more about what you want that will help narrow down the choices.
Is this for testing, or automation, or are you going to drive the mouse and button yourself and just want something to observe what is going on under the hood in the GUI?
>How do I get Python to detect other widgets?
On a machine, or in a browser? If in a machine, which platform: Linux/Windows (which)/Mac?
If in a browser, which browser (and major version)?
> But the return value for that function seems to be some ambiguous long number I can't decipher
Using longs as resource handles is par for the course, although good GUI drivers also work with string/regex matching on window and button names.
> plus it only works in its own application.
What do you mean, and what are you expecting it to return you? You should be able to look up that GUI object and access its title. Look for a GUI driver that works with window and button names.
Here is one list, read it through and see what sounds useful. I have used AutoIt under Win32, it's great, widely-used and actively-maintained; it can be called from Python (via subprocess).
Here are comparisons by the author of PyWinAuto on his and similar tools. Give a read to his criticisms of its structure from 2010. If none of these is what you want, at least you now have the vocabulary to tell us what would be...

Detect Areas of Text in Screenshot

I'm working on a project to increase the ability for wine to automatically test software packages. What I'm looking to do now is detect text in the screen capture of the current window. I can then parse all of the text and use autohotkey to give a mouse click on the coordinates of the text I want.
For example, in firefox, I might want to test different things, the first open being opening preferences. I would then need to parse the screenshot of firefox, detect all of the separate locations of text. I can then run these separate images of text into tesseract-ocr and detect which one, says "Edit". I then redo this again for "preferences".
I've tried to find a solution but so far can't find anything. I'd prefer a solution that uses python or has python binds as thats what I've been programing in so far.

A possible starting point is Project SIKULI. It is a tool to automate GUI testing. It is written in Java, nonetheless it includes a scripting environment based on Jython, hence modifying it to support python script may be not too difficult.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Automatically controlling an app on the PC using openCV Python - python

Related

Trying to find way to automate clicking button process on a company application with out images

How to read/edit a GUI/MFC application in Python?

Python Script (main) + Blender "face" animation

Python widget/cursor detection?

Detect Areas of Text in Screenshot

Categories

Resources