How do I get the active window on Gnome Wayland? - python

Background: I'm working on a piece of software called ActivityWatch that logs what you do on your computer. Basically an attempt at addressing some of the issues with: RescueTime, selfspy, arbtt, etc.
One of the core things we do is log information about the active window (class and title). In the past, this has been done using on Linux using xprop and now python-xlib without issue.
But now we have a problem: Wayland is on the rise, and as far as I can see Wayland has no notion of an active window. So my fear is that we will have to implement support for each and every desktop environment available for Wayland (assuming they'll provide the capability to get information about the active window at all).
Hopefully they'll eventually converge and have some common interface to get this done, but I'm not holding my breath...
I've been anticipating this issue. But today we got our first user request for Wayland support by an actual Wayland user. As larger distros are adopting Wayland as the default display server protocol (Fedora 25 is already using it, Ubuntu will switch in 17.10 which is coming soon) the situation is going to get more critical over time.
Relevant issues for ActivityWatch:
https://github.com/ActivityWatch/aw-watcher-window/issues/18
https://github.com/ActivityWatch/activitywatch/issues/92
There are other applications like ActivityWatch that would require the same functionality (RescueTime, arbtt, selfspy, etc.), they don't seem to support Wayland right now and I can't find any details about them planning to do so.
I'm now interested in implementing support for Gnome to start off with and follow up with others as the path becomes more clear.
A similar question concerning Weston has been asked here: get the list of active windows in wayland weston
Edit: I asked in #wayland on Freenode, got the following reply:
15:20:44 ErikBjare Hello everybody. I'm working on a piece of self-tracking software called ActivityWatch (https://github.com/ActivityWatch/activitywatch). I know this isn't exactly the right place to ask, but I was wondering if anyone knew anything about getting the active window in any Wayland-using DE.
15:20:57 ErikBjare Created a question on SO: https://stackoverflow.com/questions/45465016/how-do-i-get-the-active-window-on-gnome-wayland
15:21:25 ErikBjare Here's the issue in my repo for it: https://github.com/ActivityWatch/activitywatch/issues/92
15:22:54 ErikBjare There are a bunch of other applications that depend on it (RescueTime, selfspy, arbtt, ulogme, etc.) so they'd need it as well
15:24:23 blocage ErikBjare, in the core protocol you cannot know which windnow has the keyboard or cursor focus
15:24:39 blocage ErikBjare, in the wayland core protocol *
15:25:10 blocage ErikBjare, you can just know if your window has the focus or not, it a design choise
15:25:23 blocage avoid client spying each other
15:25:25 ErikBjare blocage: I'm aware, that's my reason for concern. I'm not saying it should be included or anything, but as it looks now every DE would need to implement it themselves if these kind of applications are to be supported
15:25:46 ErikBjare So wondering if anyone knew the teams working with Wayland on Gnome for example
15:26:11 ErikBjare But thanks for confirming
15:26:29 blocage ErikBjare, DE should create a custom extension, or use D-bus or other IPC
15:27:31 blocage ErikBjare, I guess some compositor are around here, but I do not know myself if there is such extension already
15:27:44 blocage compositor developers *
15:28:36 ErikBjare I don't think there is (I've done quite a bit of searching), so I guess I need to catch the attention of some DE developers
15:29:16 ErikBjare Thanks a lot though
15:29:42 ErikBjare blocage: Would you mind if I shared logs of our conversation in the issue?
15:30:05 blocage just use it :) it's public
15:30:19 ErikBjare ty :)
Edit 2: Filed an enhancement issue in the Gnome bugtracker.
tl;dr: How do I get the active window on Gnome when using Wayland?

The two previous answers are outdated, this is the current state of querying appnames and titles of windows in (Gnome) Wayland.
A Gnome-specific JavaScript API which can be accessed over DBus
The wlr-foreign-toplevel-management Wayland protocol (unfortunately not implemented by Gnome)
The Gnome-specific API will likely break between Gnome versions, but it works. It is heavily dependent on Gnome internal API to work so there is no chance of it becoming a standard API. There is a PR on aw-watcher-window to add this, but it needs some clean-up and afk-support if that's possible.
The wlr-foreign-toplevel-management protocol is (at the time of writing this) implemented by the Sway, Mir, Phosh and Wayfire compositors. Together with the idle.xml protocol which is pretty widely implemented by wayland compositors there's a complete implementation with afk-detection for ActivityWatch in aw-watcher-window-wayland. I've been in discussions with sway/rootston developers about whether wayland appnames and X11 wm_class is interchangeable and both Sway and Phosh use these interchangeably now so there should no longer be any distinguishable differences between Wayland and XWayland windows in the API anymore.
I have not researched if KWin has some API similar to Gnome Shell to fetch appnames and titles, but it does at least not implement wlr-foreign-toplevel-management.

In my opinion the best choice you have is not Wayland or any available library (there are not one). Actually who know in gnome-wayland about the active windows is Mutter, so you need to find a way to ask to Mutter the active windows. Gnome can develop an API to internally ask to mutter the active window and restore the functionality. But really, you don't have a place to ask for it. Mutter will not develop an API to access to his internal representation, because this will be pretty specific of Mutter only and not to all Wayland windows manager. So this need to be added to an external library, where this library could talk probably with the current window manager that it's in use to resolve your request in a general way.
Another possibility is add a Wayland plugin where all windows manager will have a way to share the current active windows and in some way a library to talk directly with wayland to restore the functionality.
So, your app is in a big problem. Most you can do is request this on mutter (where is know the active windows), but in my opinion it can not be resolved in Mutter.
I hope this will help you and you can find a way. Good luck.

https://stackoverflow.com/a/64030239/388010 has the correct answer. Nevertheless, here's the concrete and unsatisfying solution that implements option (1). The following works through a gnome extension using Gnome 43 at the time of writing (and perhaps will keep on working as long as the extension is maintained):
Install https://extensions.gnome.org/extension/4974/window-calls-extended/
Run gdbus call --session --dest org.gnome.Shell --object-path /org/gnome/Shell/Extensions/WindowsExt --method org.gnome.Shell.Extensions.WindowsExt.FocusPID | sed -E "s/\\('(.*)',\\)/\\1/g" to get the PID of the focus window or use a different method of WindowsExt.

I have a script called preguiça.py, that does exactly what you're doing, though it is probably a lot simpler and I haven't released it.
For my script, I acquired the window title using PyGObject's Window Navigator Construction Kit (Wnck).
Here's a simplified version of it, with the essencial parts:
from gi.repository import Wnck
from gi.repository import GObject
def changed (screen, window, data):
print ("Changed!")
# window = screen.get_active_window()
if window:
print ("Title: %s" % window.get_name())
screen = Wnck.Screen.get_default ()
screen.connect ("active-window-changed", changed, None)
mainLoop = GObject.MainLoop ()
try:
mainLoop.run ()
except KeyboardInterrupt:
print ("Hey")
mainLoop.unref ()
The actual code for what you're asking is actually commented out on the example above (I didn't need to capture the window, as the callback already receives it), but you may need it depending on your implementation.
I wrote it for X, and it didn't complain when I switched to Wayland, so it should probably work for you.
Note it doesn't get the information from Wayland, as you asked, but it is probably actually better, as it will be X/Wayland-agnostic. It got the title from an xterm I opened, so it should be toolkit-agnostic, as well.
Don't ask me on the details of the implementation, though. The code is at least four years old :)

Related

Why does Python's IDLE crash when I type a parenthesis on Mac?

Ok, I realize this may be an extremely nuanced question, but it has been bugging me for a while. I like the simple scripting interface of IDLE, but it keeps crashing on me when: (1) I am coding on an external monitor and (2) I type the parenthesis button, "(". IDLE never crashes for me for any other reason than this very specific situation. Strangely, if I have an external monitor connected, but I have the IDLE dev window on my laptop's main screen, I have ZERO problems with crashing. (???) I have lost a substantial amount of code due to this problem.
I am running on Mac OSX Version 10.11.3 and I have a MacBook Pro (Retina, 15-inch, Mid 2015) Any thoughts would be appreciated!
Ok, answering my own question. Per the recomendation of Андрей, I reviewed the notes and comments here: http://bugs.python.org/issue16177 I did some experimentation and figured out a work-around to avoid this problem. The problem only occurs when you are coding in an external monitor AND when the "Arrangement" of the external monitor is set as being higher (or elevated) relative to the primary monitor. Specifically, it occurs when the IDLE development window is totally or near-totally in a space on the secondary screen that would be considered "North" of the top edge of the primary screen. Thus, the patch is to reconfigure your "Arrangement" settings on your Mac so that the monitors are systematically aligned in a near-horizontal fashion. This may make things feel less natural, but it will fix the problem. That being said, I have no idea what the root cause of the problem is. I'm just glad to finally have this figured out. Hope this helps at least one other person.
I found a fix! One that doesn't require changing monitor settings.
In IDLE:
Options Menu > Configure Extensions > CallTips > set to FALSE
Then restart.
Took much research to find that super simple solution... the problem is caused not by an error in IDLE but by an error in the mac's Tcl/Tk code when calltips are called in external monitors above the default monitor.
Typing '(' after a function name should bring up a calltip giving the signature of the function if the function is currently known. Functions can be made known by occasionally running your code. We recentlyly discovered that some combinations of Mac OSX or MacOS and tcl/tk require an addition of one line to idlelib/calltip_w.py (3.6+) or idlelib/CallTipWindow.py (3.5-). Issue 34275
self.label.pack() # Line 74
tw.update_idletasks() # ADD THIS LINE!
tw.lift()
Without this, the calltip does not appear. I don't know if this also prevents any of the crashes that people have reported. If the above does not work, please remove _idletasks and let me know in a comment.

Cross-platform multimedia keys in Python

I'm just starting on an application that will need to be able to receive multimedia key (play/pause, skip, previous) presses. I'm looking to target Mac, Linux (major distros), and Windows. I've seen a solution for GNOME that appears to do what I need, but as simple as it sounds, never anything that can pick up those keys on all major platforms. I also need to be able to pick up the keys globally, since the application will run in the background and won't ever have focus.
Currently, I'm not strongly tied to Python, but since I'd like to be able to target multiple platforms, Python seemed like the way to go. Has anyone written any cross-platform libraries that can do this? I haven't been able to find any that work.
PyQT looks like a potentially viable option, but some people have hinted that global key detection may be problematic on OSX.
With PyQt (or PySide) you can use the Qt::AA_CaptureMultimediaKeys application flag to enable cross-platform capturing of multimedia keys. In principle, using that flag your Qt program should be able to receive keyboard events when the user presses multimedia keys such as Play (Qt::Key_MediaPlay), Stop (Qt::Key_MediaStop), Pause (Qt::Key_MediaPause) etc. For a full list of supported keys, have a look at the documentation.
I cannot say if all keys will be supported on all platforms, but in general Qt aims to provide very good interoperability between different operating systems. I think with a simple prototype you should be able to answer that question really quick (I don't have access to a MacOS environment so I cannot test it there, but for Windows & Linux it should work). For more information on how to process keyboard events using Qt, have a look at the documentation of the QKeyEvent class.
I took over python-mmkeys a few months ago. I actually never tried to compile it, but it was included in the code of a project.
It is PyGTK dependant, but it is available on GNU/Linux, MacOX, and Windows.
The code is pretty easy to use:
import mmkeys
keys = mmkeys.Mmkeys()
keys.connect("mm_prev", previous_cb)
keys.connect("mm_next", next_cb)
keys.connect("mm_playpause", playpause_cb)

Something like pyHook on OS X

I am actually working with pyHook, but I'd like to write my program for OS X too.
If someone know such a module ... I've been looking on the internet for a while, but nothing really relevant.
-> The idea is to be able to record keystrokes outside the python app. My application is a community statistics builder, so it would be great to have statistics from OS X too.
Thanks in advance ;)
Edit:
PyHook : Record keystrokes and other things outside the python app
http://sourceforge.net/apps/mediawiki/pyhook/index.php?title=PyHook_Tutorial
http://pyhook.sourceforge.net/doc_1.5.0/
http://sourceforge.net/apps/mediawiki/pyhook/index.php?title=Main_Page
As far as I know, there is no Python library for this, so you're going to be calling native APIs. The good news is that PyObjC (which comes with the built-in Python on recent OS releases) often makes that easy.
There are two major options. For either of these to work, your app has to have a Cocoa/CoreFoundation runloop (just as in Windows, a lot of things require you to be a "Windows GUI executable" rather than a "command line executable"), which I won't explain how to do here. (Find a good tutorial for building GUI apps in Python, if you don't know how, because that's the simplest way.)
The easy option is the Cocoa global event monitor API. However, it has some major limitations. You only get events that are going to another app--which means media keys, global hotkeys, and keys that are for whatever reason ignored will not show up. Also, you need to be "trusted for accessibility". (The simplest way to do that is to ask the user to turn it on globally, in the Universal Access panel of System Preferences.)
The hard option is the Quartz event tap API. It's a lot more flexible, and it only requires exactly the appropriate rights (which, depending on the settings you use, may include being trusted for accessibility and/or running as root), and it's a lot more powerful, but it takes a lot more work to get started, and it's possible to screw up your system if you get it wrong (e.g., by eating all keystrokes and mouse events so they never get to the OS and you can't reboot except with the power button).
For references on all of the relevant functions, see https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/ApplicationKit/Classes/nsevent_Class/Reference/Reference.html (for NSEvent) and https://developer.apple.com/library/mac/#documentation/Carbon/Reference/QuartzEventServicesRef/Reference/reference.html (for Quartz events). A bit of googling should turn up lots of sample code out there in Objective C (for NSEvent) or C (for CGEventTap), but little or nothing in Python, so I'll show some little fragments that illustrate how you'd port the samples to Python:
import Cocoa
def evthandler(event):
pass # this is where you do stuff; see NSEvent documentation for event
observer = Cocoa.NSEvent.addGlobalMonitorForEventsMatchingMask_handler_(NSKeyDown, evthandler)
# when you're done
Cocoa.NSEvent.removeMonitor_(observer)
import Quartz
def evthandler(proxy, type, event, refcon):
pass # Here's where you do your stuff; see CGEventTapCallback
return event
source = Quartz.CGEventSourceCreate(Quartz.kCGEventSourceStateHIDSystemState)
tap = Quartz.CGEventTapCreate(Quartz.kCGSessionEventTap,
Quartz.kCGHeadInsertEventTap,
Quartz.kCGEventTapOptionListenOnly,
(Quartz.CGEventMaskBit(Quartz.kCGEventKeyDown) |
Quartz.CGEventMaskBit(Quartz.kCGEventKeyUp)),
handler,
refcon)
Another option, at about the same level as Quartz events, is Carbon events (starting with InstallEventHandler). However, Carbon is obsolete, and on top of that, it's harder to get at from Python, so unless you have some specific reason to go this way, don't.
There are some other ways to get to the same point—e.g., use DYLD_INSERT_LIBRARIES or SIMBL to get some code inserted into each app—but I can't think of anything else that can be done in pure Python.
A possible quick alternative maybe this
https://github.com/gurgeh/selfspy
It claims to work on both mac and windows. It is based on pyhook on the windows part.
Good luck.

How would one go about developing a curses-based UI?

I'm planning to develop a GUI application that uses curses. The idea is to provide an extra interface for a web interface, so that everything on the web site could also be done via the UI.
Basically, it should be platform independent: the user would have to SSH to the server after which the UI would automatically take over.
First of all, is this doable? As far as I understand, it would be platform independent as long as the end-user had the proper terminal software installed. Correct me, if I'm wrong.
I was planning to use Python for this, as it is the language I'm the most proficient in. Python comes with the ncurses library and Urwid, which I've been told, is quite good.
After having a quick test with Urwid, I had some problems. The thing is, I'm quite worried that I won't find answers to the problems that I will encounter down the road because apparently curses UI-s aren't all the rage nowadays. Documentation and examples are thus quite scarce.
In conclusion, should really I embark on this and quit my whining or drop the idea altogether? Any other suggestions?
It's certainly possible, and curses-based applications are still written regularly (e.g. PuDB is only 14 months old) although maybe not very often.
Did you try asking questions on the Urwid mailing list and/or IRC channel?
oh my, wouldn't this be a dream!
i've seen a couple of things out there to varying degrees of success.
Morticious Thrind: http://thrind.xamai.ca/
future death toll: http://f-dt.com/?wptheme=wp-cli
wordpress yadda yadda, this could be as simple as a 960/blueprint CSS, prototype.js, and a oneliner:
//TODO: Implement useful functionality && unit tests && documentation
//TODO: read
try { eval($F(x)); } catch (e) { panic(); }
BUT! this type of thing is pretty radical. i mean- ANYTHING can happen on the canvas of a web-browser these days, but any terminal emulator or lynx serves this purpose with flare.
also be sure to check out: https://stackoverflow.com/questions/472644/javascript-collection-of-one-line-useful-functions
the real question is what sort of software you plan on ncursing (sic,pun,etc.)-- it probably already has some rather useful command-line interfaces (sh).
It can be done but it's a struggle. I would recommend improving the web interface. You can use JavaScript to add keyboard shortcuts, for example, which can be very helpful for a faster workflow (see Gmail's interface, for example).

Is there a cross-platform python low-level API to capture or generate keyboard events?

I am trying to write a cross-platform python program that would run in the background, monitor all keyboard events and when it sees some specific shortcuts, it generates one or more keyboard events of its own. For example, this could be handy to have Ctrl-# mapped to "my.email#address", so that every time some program asks me for my email address I just need to type Ctrl-#.
I know such programs already exist, and I am reinventing the wheel... but my goal is just to learn more about low-level keyboard APIs. Moreover, the answer to this question might be useful to other programmers, for example if they want to startup an SSH connection which requires a password, without using pexpect.
Thanks for your help.
Note: there is a similar question but it is limited to the Windows platform, and does not require python. I am looking for a cross-platform python api. There are also other questions related to keyboard events, but apparently they are not interested in system-wide keyboard events, just application-specific keyboard shortcuts.
Edit: I should probably add a disclaimer here: I do not want to write a keylogger. If I needed a keylogger, I could download one off the web a anyway. ;-)
There is no such API. My solution was to write a helper module which would use a different helper depending on the value of os.name.
On Windows, use the Win32 extensions.
On Linux, things are a bit more complex since real OSes protect their users against keyloggers[*]. So here, you will need a root process which watches one of[] the handles in /dev/input/. Your best bet is probably looking for an entry below /dev/input/by-path/ which contains the strings "kbd" or "keyboard". That should work in most cases.
[*]: Jeez, not even my virus/trojan scanner will complain when I start a Python program which hooks into the keyboard events...
As the guy that wrote the original pykeylogger linux port, I can say there isn't really a cross platform one. Essentially I rewrote the pyhook API for keyboard events to capture from the xserver itself, using the record extension. Of course, this assumes the record extension is there, loaded into the x server.
From there, it's essentially just detecting if you're on windows, or linux, and then loading the correct module for the OS. Everything else should be identical.
Take a look at the pykeylogger source, in pyxhook.py for the class and implimentation. Otherwise, just load that module, or pyhook instead, depending on OS.
I've made a few tests on Ubuntu 9.10. pykeylogger doesn't seems to be working. I've tryied to change the /etc/X11/xorg.conf in order to allow module to be loaded but in that specific version of ubuntu there is no xorg.conf. So, in my opiniion pykelogger is NOT working on ubuntu 9.10 !!
Cross-platform UI libraries such as Tkinter or wxPython have API for keyboard events. Using these you could map «CTRL» + «#» to an action.
On linux, you might want to have a look at pykeylogger. For some strange reason, reading from /dev/input/.... doesn't always work when X is running. For example it doesn't work on ubuntu 8.10. Pykeylogger uses xlib, which works exactly when the other way doesn't. I'm still looking into this, so if you find a simpler way of doing this, please tell me.
Under Linux it's possible to do this quite easily with Xlib. See this page for details:
http://www.larsen-b.com/Article/184.html

Categories

Resources