I'm just starting on an application that will need to be able to receive multimedia key (play/pause, skip, previous) presses. I'm looking to target Mac, Linux (major distros), and Windows. I've seen a solution for GNOME that appears to do what I need, but as simple as it sounds, never anything that can pick up those keys on all major platforms. I also need to be able to pick up the keys globally, since the application will run in the background and won't ever have focus.
Currently, I'm not strongly tied to Python, but since I'd like to be able to target multiple platforms, Python seemed like the way to go. Has anyone written any cross-platform libraries that can do this? I haven't been able to find any that work.
PyQT looks like a potentially viable option, but some people have hinted that global key detection may be problematic on OSX.
With PyQt (or PySide) you can use the Qt::AA_CaptureMultimediaKeys application flag to enable cross-platform capturing of multimedia keys. In principle, using that flag your Qt program should be able to receive keyboard events when the user presses multimedia keys such as Play (Qt::Key_MediaPlay), Stop (Qt::Key_MediaStop), Pause (Qt::Key_MediaPause) etc. For a full list of supported keys, have a look at the documentation.
I cannot say if all keys will be supported on all platforms, but in general Qt aims to provide very good interoperability between different operating systems. I think with a simple prototype you should be able to answer that question really quick (I don't have access to a MacOS environment so I cannot test it there, but for Windows & Linux it should work). For more information on how to process keyboard events using Qt, have a look at the documentation of the QKeyEvent class.
I took over python-mmkeys a few months ago. I actually never tried to compile it, but it was included in the code of a project.
It is PyGTK dependant, but it is available on GNU/Linux, MacOX, and Windows.
The code is pretty easy to use:
import mmkeys
keys = mmkeys.Mmkeys()
keys.connect("mm_prev", previous_cb)
keys.connect("mm_next", next_cb)
keys.connect("mm_playpause", playpause_cb)
Related
This problem involves the collision of several problems, all of which I understand only somewhat well, but I include them together because they could all be the entry point for a solution. Here is the best description I can give.
I have an app, in python. (I imagine I could theoretically solve all of these problems by learning Cocoa and ObjectiveC, but that seems like QUITE a lift, for this problem -- AND, as noted below, this problem may not actually be related to python, really, at all. I just don't know.) A CORE feature of this app is to trigger a minigame, with a hotkey -- meaning, the hotkey itself is fundamental to the desired functionality. And furthermore, I would really like to package this app, to let other people use it. (Locally, it works great! Hey!)
The problem starts with the fact that adding the hotkey -- which I am doing with
import keyboard
keyboard.add_hotkey('windows+shift+y', trigger_minigame)
-- requires root access. Due to DIRE WARNINGS in another SO post Forcing a GUI application to run as root (which, honestly, I only vaguely understand), I would like to grant that access to ONLY this part of the program. I IMAGINE, such an approach would look something like this:
# needs_root.py
import keyboard
from shouldnt_have_root import trigger_minigame
keyboard.add_hotkey('windows+shift+y', trigger_minigame)
# shouldnt_have_root.py
def minigame():
buncha pygame, GUI stuff (which is dangerous???)
def trigger_minigame():
adds event to minigame's event queue
# bash script
sudo python needs_root.py
HOWEVER -- there are several major challenges!
The biggest is that I don't even know if THAT is safe, since I don't know how security and permissions (especially with imports) works at all! And more generally, how dangerous are the imports? It appears that I may in fact have to import substantially more, to make it clear what event queue the trigger is adding an event TO -- and I don't know how to have that communication happen, while still isolating the GUI parts (or generally dangerous ones) from unnecessary and hazardous access.
There's another layer too though; packaging it through pyinstaller means that I can't target the scripts directly, because they'll have been turned into binaries, but according to THIS answer Packaging multiple scripts in PyInstaller it appears I can just target the binaries instead, i.e. have the first binary call
osascript -e 'do shell script "python needs_root_binary" with admin.'
to get the user to bless only the necessary part, but I don't know if that will put OTHER obstacles, or vulnerabilities (or inter-file communication difficulties), in the way.
LAST, I could try STARTING as root, and then switching away from it, as soon as the hotkey is set (and before anything else happens) -- but would that be safe? I'm still worried about the fact that it involves running sudo on the whole app.
In any event --
is this as big a mess as it feels?
How do I give root access to only a piece of a packaged .app, that I've written in python?
I'd advice You to:
enable the root access,
write the script,
disable the root access
as it's closer described in here.
The Pyinstaller is another chapter. When I was making software requiring usage of hotkeys, I was forced to use another than keyboard, because it wasn't working properly on PC without Python, therefore I made a hotkey with tkinter built-in function canvas.bind() (more info here).
Hopefully I helped.
You can not run a specific Python function as root, only the Python process executing your script can be run with elevated permissions.
So my answer is: your problem as described is unsolvable.
Background: I'm working on a piece of software called ActivityWatch that logs what you do on your computer. Basically an attempt at addressing some of the issues with: RescueTime, selfspy, arbtt, etc.
One of the core things we do is log information about the active window (class and title). In the past, this has been done using on Linux using xprop and now python-xlib without issue.
But now we have a problem: Wayland is on the rise, and as far as I can see Wayland has no notion of an active window. So my fear is that we will have to implement support for each and every desktop environment available for Wayland (assuming they'll provide the capability to get information about the active window at all).
Hopefully they'll eventually converge and have some common interface to get this done, but I'm not holding my breath...
I've been anticipating this issue. But today we got our first user request for Wayland support by an actual Wayland user. As larger distros are adopting Wayland as the default display server protocol (Fedora 25 is already using it, Ubuntu will switch in 17.10 which is coming soon) the situation is going to get more critical over time.
Relevant issues for ActivityWatch:
https://github.com/ActivityWatch/aw-watcher-window/issues/18
https://github.com/ActivityWatch/activitywatch/issues/92
There are other applications like ActivityWatch that would require the same functionality (RescueTime, arbtt, selfspy, etc.), they don't seem to support Wayland right now and I can't find any details about them planning to do so.
I'm now interested in implementing support for Gnome to start off with and follow up with others as the path becomes more clear.
A similar question concerning Weston has been asked here: get the list of active windows in wayland weston
Edit: I asked in #wayland on Freenode, got the following reply:
15:20:44 ErikBjare Hello everybody. I'm working on a piece of self-tracking software called ActivityWatch (https://github.com/ActivityWatch/activitywatch). I know this isn't exactly the right place to ask, but I was wondering if anyone knew anything about getting the active window in any Wayland-using DE.
15:20:57 ErikBjare Created a question on SO: https://stackoverflow.com/questions/45465016/how-do-i-get-the-active-window-on-gnome-wayland
15:21:25 ErikBjare Here's the issue in my repo for it: https://github.com/ActivityWatch/activitywatch/issues/92
15:22:54 ErikBjare There are a bunch of other applications that depend on it (RescueTime, selfspy, arbtt, ulogme, etc.) so they'd need it as well
15:24:23 blocage ErikBjare, in the core protocol you cannot know which windnow has the keyboard or cursor focus
15:24:39 blocage ErikBjare, in the wayland core protocol *
15:25:10 blocage ErikBjare, you can just know if your window has the focus or not, it a design choise
15:25:23 blocage avoid client spying each other
15:25:25 ErikBjare blocage: I'm aware, that's my reason for concern. I'm not saying it should be included or anything, but as it looks now every DE would need to implement it themselves if these kind of applications are to be supported
15:25:46 ErikBjare So wondering if anyone knew the teams working with Wayland on Gnome for example
15:26:11 ErikBjare But thanks for confirming
15:26:29 blocage ErikBjare, DE should create a custom extension, or use D-bus or other IPC
15:27:31 blocage ErikBjare, I guess some compositor are around here, but I do not know myself if there is such extension already
15:27:44 blocage compositor developers *
15:28:36 ErikBjare I don't think there is (I've done quite a bit of searching), so I guess I need to catch the attention of some DE developers
15:29:16 ErikBjare Thanks a lot though
15:29:42 ErikBjare blocage: Would you mind if I shared logs of our conversation in the issue?
15:30:05 blocage just use it :) it's public
15:30:19 ErikBjare ty :)
Edit 2: Filed an enhancement issue in the Gnome bugtracker.
tl;dr: How do I get the active window on Gnome when using Wayland?
The two previous answers are outdated, this is the current state of querying appnames and titles of windows in (Gnome) Wayland.
A Gnome-specific JavaScript API which can be accessed over DBus
The wlr-foreign-toplevel-management Wayland protocol (unfortunately not implemented by Gnome)
The Gnome-specific API will likely break between Gnome versions, but it works. It is heavily dependent on Gnome internal API to work so there is no chance of it becoming a standard API. There is a PR on aw-watcher-window to add this, but it needs some clean-up and afk-support if that's possible.
The wlr-foreign-toplevel-management protocol is (at the time of writing this) implemented by the Sway, Mir, Phosh and Wayfire compositors. Together with the idle.xml protocol which is pretty widely implemented by wayland compositors there's a complete implementation with afk-detection for ActivityWatch in aw-watcher-window-wayland. I've been in discussions with sway/rootston developers about whether wayland appnames and X11 wm_class is interchangeable and both Sway and Phosh use these interchangeably now so there should no longer be any distinguishable differences between Wayland and XWayland windows in the API anymore.
I have not researched if KWin has some API similar to Gnome Shell to fetch appnames and titles, but it does at least not implement wlr-foreign-toplevel-management.
In my opinion the best choice you have is not Wayland or any available library (there are not one). Actually who know in gnome-wayland about the active windows is Mutter, so you need to find a way to ask to Mutter the active windows. Gnome can develop an API to internally ask to mutter the active window and restore the functionality. But really, you don't have a place to ask for it. Mutter will not develop an API to access to his internal representation, because this will be pretty specific of Mutter only and not to all Wayland windows manager. So this need to be added to an external library, where this library could talk probably with the current window manager that it's in use to resolve your request in a general way.
Another possibility is add a Wayland plugin where all windows manager will have a way to share the current active windows and in some way a library to talk directly with wayland to restore the functionality.
So, your app is in a big problem. Most you can do is request this on mutter (where is know the active windows), but in my opinion it can not be resolved in Mutter.
I hope this will help you and you can find a way. Good luck.
https://stackoverflow.com/a/64030239/388010 has the correct answer. Nevertheless, here's the concrete and unsatisfying solution that implements option (1). The following works through a gnome extension using Gnome 43 at the time of writing (and perhaps will keep on working as long as the extension is maintained):
Install https://extensions.gnome.org/extension/4974/window-calls-extended/
Run gdbus call --session --dest org.gnome.Shell --object-path /org/gnome/Shell/Extensions/WindowsExt --method org.gnome.Shell.Extensions.WindowsExt.FocusPID | sed -E "s/\\('(.*)',\\)/\\1/g" to get the PID of the focus window or use a different method of WindowsExt.
I have a script called preguiça.py, that does exactly what you're doing, though it is probably a lot simpler and I haven't released it.
For my script, I acquired the window title using PyGObject's Window Navigator Construction Kit (Wnck).
Here's a simplified version of it, with the essencial parts:
from gi.repository import Wnck
from gi.repository import GObject
def changed (screen, window, data):
print ("Changed!")
# window = screen.get_active_window()
if window:
print ("Title: %s" % window.get_name())
screen = Wnck.Screen.get_default ()
screen.connect ("active-window-changed", changed, None)
mainLoop = GObject.MainLoop ()
try:
mainLoop.run ()
except KeyboardInterrupt:
print ("Hey")
mainLoop.unref ()
The actual code for what you're asking is actually commented out on the example above (I didn't need to capture the window, as the callback already receives it), but you may need it depending on your implementation.
I wrote it for X, and it didn't complain when I switched to Wayland, so it should probably work for you.
Note it doesn't get the information from Wayland, as you asked, but it is probably actually better, as it will be X/Wayland-agnostic. It got the title from an xterm I opened, so it should be toolkit-agnostic, as well.
Don't ask me on the details of the implementation, though. The code is at least four years old :)
I'm currently at a crossroads. I'm somewhat versed in Python (2.7) and would really like to start getting into GUI to give my (although mini) projects some more depth and versibility.
For the most part, my scripts don't use anything graphical so this is the first time I'm dipping my toes in this water.
That said, I've tried using pygame and tkinter but seem to fail at every turn to get something up and running (although I had some slight success with pygame)
Am I correct to understand that for both I need X started in order to generate any type of interface, and with that, so I need X to get any type of input (touchscreen presses)?
Thanks in advance!
In order to use tkinter, you must have a graphics system running. For Windows and OSX that simply means you need to be logged in (ie: can't run as a service). For linux and other unix-like systems that means that you must have X running.
Neither tkinter nor any of the other common GUI toolkits will write directly to the screen.
I'm gonna give an alternative answer. If you know HTML, CSS and Javascript (or have time to give it a try) I would recommend using Flask, http://flask.pocoo.org/.
With flask you can create websites but you can also (as I am using it) let it be your GUI. It will work on any device and looks really good :).
I am actually working with pyHook, but I'd like to write my program for OS X too.
If someone know such a module ... I've been looking on the internet for a while, but nothing really relevant.
-> The idea is to be able to record keystrokes outside the python app. My application is a community statistics builder, so it would be great to have statistics from OS X too.
Thanks in advance ;)
Edit:
PyHook : Record keystrokes and other things outside the python app
http://sourceforge.net/apps/mediawiki/pyhook/index.php?title=PyHook_Tutorial
http://pyhook.sourceforge.net/doc_1.5.0/
http://sourceforge.net/apps/mediawiki/pyhook/index.php?title=Main_Page
As far as I know, there is no Python library for this, so you're going to be calling native APIs. The good news is that PyObjC (which comes with the built-in Python on recent OS releases) often makes that easy.
There are two major options. For either of these to work, your app has to have a Cocoa/CoreFoundation runloop (just as in Windows, a lot of things require you to be a "Windows GUI executable" rather than a "command line executable"), which I won't explain how to do here. (Find a good tutorial for building GUI apps in Python, if you don't know how, because that's the simplest way.)
The easy option is the Cocoa global event monitor API. However, it has some major limitations. You only get events that are going to another app--which means media keys, global hotkeys, and keys that are for whatever reason ignored will not show up. Also, you need to be "trusted for accessibility". (The simplest way to do that is to ask the user to turn it on globally, in the Universal Access panel of System Preferences.)
The hard option is the Quartz event tap API. It's a lot more flexible, and it only requires exactly the appropriate rights (which, depending on the settings you use, may include being trusted for accessibility and/or running as root), and it's a lot more powerful, but it takes a lot more work to get started, and it's possible to screw up your system if you get it wrong (e.g., by eating all keystrokes and mouse events so they never get to the OS and you can't reboot except with the power button).
For references on all of the relevant functions, see https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/ApplicationKit/Classes/nsevent_Class/Reference/Reference.html (for NSEvent) and https://developer.apple.com/library/mac/#documentation/Carbon/Reference/QuartzEventServicesRef/Reference/reference.html (for Quartz events). A bit of googling should turn up lots of sample code out there in Objective C (for NSEvent) or C (for CGEventTap), but little or nothing in Python, so I'll show some little fragments that illustrate how you'd port the samples to Python:
import Cocoa
def evthandler(event):
pass # this is where you do stuff; see NSEvent documentation for event
observer = Cocoa.NSEvent.addGlobalMonitorForEventsMatchingMask_handler_(NSKeyDown, evthandler)
# when you're done
Cocoa.NSEvent.removeMonitor_(observer)
import Quartz
def evthandler(proxy, type, event, refcon):
pass # Here's where you do your stuff; see CGEventTapCallback
return event
source = Quartz.CGEventSourceCreate(Quartz.kCGEventSourceStateHIDSystemState)
tap = Quartz.CGEventTapCreate(Quartz.kCGSessionEventTap,
Quartz.kCGHeadInsertEventTap,
Quartz.kCGEventTapOptionListenOnly,
(Quartz.CGEventMaskBit(Quartz.kCGEventKeyDown) |
Quartz.CGEventMaskBit(Quartz.kCGEventKeyUp)),
handler,
refcon)
Another option, at about the same level as Quartz events, is Carbon events (starting with InstallEventHandler). However, Carbon is obsolete, and on top of that, it's harder to get at from Python, so unless you have some specific reason to go this way, don't.
There are some other ways to get to the same point—e.g., use DYLD_INSERT_LIBRARIES or SIMBL to get some code inserted into each app—but I can't think of anything else that can be done in pure Python.
A possible quick alternative maybe this
https://github.com/gurgeh/selfspy
It claims to work on both mac and windows. It is based on pyhook on the windows part.
Good luck.
I am trying to write a cross-platform python program that would run in the background, monitor all keyboard events and when it sees some specific shortcuts, it generates one or more keyboard events of its own. For example, this could be handy to have Ctrl-# mapped to "my.email#address", so that every time some program asks me for my email address I just need to type Ctrl-#.
I know such programs already exist, and I am reinventing the wheel... but my goal is just to learn more about low-level keyboard APIs. Moreover, the answer to this question might be useful to other programmers, for example if they want to startup an SSH connection which requires a password, without using pexpect.
Thanks for your help.
Note: there is a similar question but it is limited to the Windows platform, and does not require python. I am looking for a cross-platform python api. There are also other questions related to keyboard events, but apparently they are not interested in system-wide keyboard events, just application-specific keyboard shortcuts.
Edit: I should probably add a disclaimer here: I do not want to write a keylogger. If I needed a keylogger, I could download one off the web a anyway. ;-)
There is no such API. My solution was to write a helper module which would use a different helper depending on the value of os.name.
On Windows, use the Win32 extensions.
On Linux, things are a bit more complex since real OSes protect their users against keyloggers[*]. So here, you will need a root process which watches one of[] the handles in /dev/input/. Your best bet is probably looking for an entry below /dev/input/by-path/ which contains the strings "kbd" or "keyboard". That should work in most cases.
[*]: Jeez, not even my virus/trojan scanner will complain when I start a Python program which hooks into the keyboard events...
As the guy that wrote the original pykeylogger linux port, I can say there isn't really a cross platform one. Essentially I rewrote the pyhook API for keyboard events to capture from the xserver itself, using the record extension. Of course, this assumes the record extension is there, loaded into the x server.
From there, it's essentially just detecting if you're on windows, or linux, and then loading the correct module for the OS. Everything else should be identical.
Take a look at the pykeylogger source, in pyxhook.py for the class and implimentation. Otherwise, just load that module, or pyhook instead, depending on OS.
I've made a few tests on Ubuntu 9.10. pykeylogger doesn't seems to be working. I've tryied to change the /etc/X11/xorg.conf in order to allow module to be loaded but in that specific version of ubuntu there is no xorg.conf. So, in my opiniion pykelogger is NOT working on ubuntu 9.10 !!
Cross-platform UI libraries such as Tkinter or wxPython have API for keyboard events. Using these you could map «CTRL» + «#» to an action.
On linux, you might want to have a look at pykeylogger. For some strange reason, reading from /dev/input/.... doesn't always work when X is running. For example it doesn't work on ubuntu 8.10. Pykeylogger uses xlib, which works exactly when the other way doesn't. I'm still looking into this, so if you find a simpler way of doing this, please tell me.
Under Linux it's possible to do this quite easily with Xlib. See this page for details:
http://www.larsen-b.com/Article/184.html