Ensure a single instance of an application in Linux - python

I'm working on a GUI application in WxPython, and I am not sure how I can ensure that only one copy of my application is running at any given time on the machine. Due to the nature of the application, running more than once doesn't make any sense, and will fail quickly. Under Win32, I can simply make a named mutex and check that at startup. Unfortunately, I don't know of any facilities in Linux that can do this.
I'm looking for something that will automatically be released should the application crash unexpectedly. I don't want to have to burden my users with having to manually delete lock files because I crashed.

The Right Thing is advisory locking using flock(LOCK_EX); in Python, this is found in the fcntl module.
Unlike pidfiles, these locks are always automatically released when your process dies for any reason, have no race conditions exist relating to file deletion (as the file doesn't need to be deleted to release the lock), and there's no chance of a different process inheriting the PID and thus appearing to validate a stale lock.
If you want unclean shutdown detection, you can write a marker (such as your PID, for traditionalists) into the file after grabbing the lock, and then truncate the file to 0-byte status before a clean shutdown (while the lock is being held); thus, if the lock is not held and the file is non-empty, an unclean shutdown is indicated.

Complete locking solution using the fcntl module:
import fcntl
pid_file = 'program.pid'
fp = open(pid_file, 'w')
try:
fcntl.lockf(fp, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
# another instance is running
sys.exit(1)

There are several common techniques including using semaphores. The one I see used most often is to create a "pid lock file" on startup that contains the pid of the running process. If the file already exists when the program starts up, open it up and grab the pid inside, check to see if a process with that pid is running, if it is check the cmdline value in /proc/pid to see if it is an instance of your program, if it is then quit, otherwise overwrite the file with your pid. The usual name for the pid file is application_name.pid.

wxWidgets offers a wxSingleInstanceChecker class for this purpose: wxPython doc, or wxWidgets doc. The wxWidgets doc has sample code in C++, but the python equivalent should be something like this (untested):
name = "MyApp-%s" % wx.GetUserId()
checker = wx.SingleInstanceChecker(name)
if checker.IsAnotherRunning():
return False

This builds upon the answer by user zgoda. It mainly addresses a tricky concern having to do with write access to the lock file. In particular, if the lock file was first created by root, another user foo can then no successfully longer attempt to rewrite this file due to an absence of write permissions for user foo. The obvious solution seems to be to create the file with write permissions for everyone. This solution also builds upon a different answer by me, having to do creating a file with such custom permissions. This concern is important in the real world where your program may be run by any user including root.
import fcntl, os, stat, tempfile
app_name = 'myapp' # <-- Customize this value
# Establish lock file settings
lf_name = '.{}.lock'.format(app_name)
lf_path = os.path.join(tempfile.gettempdir(), lf_name)
lf_flags = os.O_WRONLY | os.O_CREAT
lf_mode = stat.S_IWUSR | stat.S_IWGRP | stat.S_IWOTH # This is 0o222, i.e. 146
# Create lock file
# Regarding umask, see https://stackoverflow.com/a/15015748/832230
umask_original = os.umask(0)
try:
lf_fd = os.open(lf_path, lf_flags, lf_mode)
finally:
os.umask(umask_original)
# Try locking the file
try:
fcntl.lockf(lf_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
msg = ('Error: {} may already be running. Only one instance of it '
'can run at a time.'
).format('appname')
exit(msg)
A limitation of the above code is that if the lock file already existed with unexpected permissions, those permissions will not be corrected.
I would've liked to use /var/run/<appname>/ as the directory for the lock file, but creating this directory requires root permissions. You can make your own decision for which directory to use.
Note that there is no need to open a file handle to the lock file.

Here's the TCP port-based solution:
# Use a listening socket as a mutex against multiple invocations
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('127.0.0.1', 5080))
s.listen(1)

Look for a python module that interfaces to SYSV semaphores on unix. The semaphores have a SEM_UNDO flag which will cause the resources held by the a process to be released if the process crashes.
Otherwise as Bernard suggested, you can use
import os
os.getpid()
And write it to /var/run/application_name.pid. When the process starts, it should check if the pid in /var/run/application_name.pid is listed in the ps table and quit if it is, otherwise write its own pid into /var/run/application_name.pid. In the following var_run_pid is the pid you read from /var/run/application_name.pid
cmd = "ps -p %s -o comm=" % var_run_pid
app_name = os.popen(cmd).read().strip()
if len(app_name) > 0:
Already running

The set of functions defined in semaphore.h -- sem_open(), sem_trywait(), etc -- are the POSIX equivalent, I believe.

If you create a lock file and put the pid in it, you can check your process id against it and tell if you crashed, no?
I haven't done this personally, so take with appropriate amounts of salt. :p

Can you use the 'pidof' utility? If your app is running, pidof will write the Process ID of your app to stdout. If not, it will print a newline (LF) and return an error code.
Example (from bash, for simplicity):
linux# pidof myapp
8947
linux# pidof nonexistent_app
linux#

By far the most common method is to drop a file into /var/run/ called [application].pid which contains only the PID of the running process, or parent process.
As an alternative, you can create a named pipe in the same directory to be able to send messages to the active process, e.g. to open a new file.

I've made a basic framework for running these kinds of applications when you want to be able to pass the command line arguments of subsequent attempted instances to the first one. An instance will start listening on a predefined port if it does not find an instance already listening there. If an instance already exists, it sends its command line arguments over the socket and exits.
code w/ explanation

Related

Python : http.server.HTTPServer : How to close ALL opened files?

So basically, I am making an HTTP webhooks server in Python 3 and wanted to add a restart function because shell access is very limited on the server it will be running on.
I found this snippet somewhere on Stack Overflow earlier:
def restart_program():
"""Restarts the current program, with file objects and descriptors
cleanup
"""
try:
p = psutil.Process(os.getpid())
fds = p.open_files() + p.connections()
print (fds)
for handler in fds:
os.close(handler.fd)
except Exception as e:
logging.error(e)
python = sys.executable
os.execl(python, python, *sys.argv)
For the most part, it works, but I wanted to make sure so I ran a few tests with lsof and found that every time I restarted the server, two more lines (files) were added to the list of open files:
python3 13923 darwin 5u systm 0x18cd0c0bebdcbfd7 0t0 [ctl com.apple.netsrc id 9 unit 36]
python3 13923 darwin 6u unix 0x18cd0c0beb8fc95f 0t0 ->0x18cd0c0beb8fbcdf
(the adresses varying each restart)
These are only present when I initiate httpd = ThreadingSimpleServer((host, port), Handler). But even after I call httpd.server_close() these open files persist and psutil doesn't seem to find them.
This isn't really required feature. If this proves to be too much overhead I can drop it, but right now I am only interested in why my code doesn't work and a solution for my own sanity.
Thanks in advance!
UPDATE:
Changing p.connections() to p.connections(kind='all') got me the unix type fd. Still not sure how to close the systm type fd. Turns out the unix fd had to do with DNS...
UPDATE:
Well, it looks like I found a solution, however messy it may be.
class MyFileHandler(object):
"""docstring for MyFileHandler."""
def __init__(self, fd):
super(MyFileHandler, self).__init__()
self.fd = fd
def get_open_systm_files(pid=os.getpid()):
proc = subprocess.Popen(['lsof', '-p', str(pid)], stdout=subprocess.PIPE)
return [MyFileHandler(int(str(l).split(' ')[6][:-1])) for l in proc.stdout.readlines() if b'systm' in l]
def restart_program():
"""Restarts the current program, with file objects and descriptors
cleanup
"""
try:
p = psutil.Process(os.getpid())
fds = p.open_files() + p.connections()
print (fds)
for handler in fds:
os.close(handler.fd)
except Exception as e:
logging.error(e)
python = sys.executable
os.execl(python, python, *sys.argv)
It's not pretty, but it works.
If anyone could shed some light on what actually is/was going on I would very much like to know.
Mmm that looks like a very hackish way to restart a process and a bad idea in general. What is your use case? Why do you want to restart a process to begin with? Regardless from your motivations, the usual way to interact with processes in that sense is via signals. I am not aware of signals designed specifically to restart a process though. What you usually want to do is terminate it (SIGTERM) and maybe have something like systemd or zdaemon which will automatically restart it. You can even write a signal handler to execute cleanup functions on SIGTERM, and that is the correct way to do cleaning up. You don't usually want to restart a process though, let alone do it from the app itself. That looks like a recipe for troubles.

How to guarantee file removing after script stopped working?

I have a script running by crontab every hour and interacts with API (database sync). Usually it take one hour or so, and I check for the next run if this process still in the memory or not:
#/usr/bin/env python
import os
import sys
pid = str(os.getpid())
pidfile = "/tmp/mydaemon.pid"
if os.path.isfile(pidfile):
print "%s already exists, exiting" % pidfile
sys.exit()
file(pidfile, 'w').write(pid)
try:
# Do some actual work here
finally:
os.unlink(pidfile)
BUT after some time script stopped working, when I look at the "ps aux | grep python", I don't see this script as the process, but I do see file on the place.
And when I run script manually, I see information printed iteratively on the screen, but after some time I see the word "Terminated", script exited and file still on the place.
How to guarantee 100% the file removed after the script stopped working?
Thanks!
It looks like your script is terminated unexpectedly, most probably due to too high memory usage. It's not guaranteed that finally will be executed on unexpected program termination. So, first of all I suggest you to find the cause of the unexpected termination an fix it.
Actually there is no 100% way to guarantee that the file will be removed. However, there are a few workarounds for handling dangling pid files.
Place your pid files on the /var/run volume, so they will be removed on unexpected system restart.
Check wether the process with such pid is still running on each script execution:
import os
def is_alive(pid):
try:
os.kill(pid, 0) # do nothing but throws an exception
return True
except OSError:
return False
# and add this to your code:
if os.path.isfile(pidfile):
with open(pidfile) as f:
if is_alive(f.read()):
sys.exit()
Again, provided code is not 100% safe because of possible pid collisions. You can make the verification of running process more sophisticated by adding parsing of ps command output. Try to find a line with the desired pid value and check wether it looks similar to your crontab entry.
Normally you can use atextit module functionality, but in your case (unexpected termination) it also may not work.
Maybe use of mkstemp (specifying required program suffix/refix) within with statement may work: it will create unique pidfile in /tmp and clear it, when with block completes or terminates.

python - prevent IOError: [Errno 5] Input/output error when running without stdout

I have a script that runs automatically on server through cronjob and it import and run several other scripts.
Some of them use prints, which naturally creates IOError: [Errno 5] Input/output error because the script runs without any SSH / terminal connected, so there's no proper stdout setup.
There are lots of questions about this subject but I couldn't find anyone that actually solve it, assuming I can't remove the print or change the executed scripts.
I tried several things, including:
class StdOut(object):
def __init__(self):
pass
def write(self, string):
pass
sys.stdout = StdOut()
sys.stderr = StdOut()
and
from __future__ import print_function
import __builtin__
def print(*args, **kwargs):
pass
__builtin__.print = print
But none of it works. I assume it only affect the module itself and not the modules I import / run later.
So how can I create a stub stdout that will affect all modules in the process? Like I said, I don't want to change the scripts that are executed from the main module, but I can change everything inside the importing module. And just to clearify - everything is imported, no new processes are spawned etc.
Thanks,
Modifying the builtin or changing sys.stdout should work (except for subprocesses—but you ruled those out) as long as you do it early enough. If not, though, there's a lower level trick that's much easier:
run your python scripts with I/O redirection that discards output:
python foo.py >/dev/null 2>&1
(assuming Unix-y scripts, as implied by "cron" in the question)
or, redirect file descriptors 1 and 2 (same idea as above, but done within your Python startup rather than as part of the cron-invoked command):
import os
fd = os.open(os.devnull, os.O_RDWR)
# NB: even if stdin is closed, fd >= 0
os.dup2(fd, 1)
os.dup2(fd, 2)
if fd > 2:
os.close(fd)
(this particular bit of code has the side effect of making /dev/null act as stdin, if all descriptors were closed). [Edit: I started with with open(...) and then switched to os.open and did not test the final version. Fixed now.]
All that said, a good cron really should have stdout and stderr connected somewhere, and should email the output/error-output to you. Not all cron versions are this nice though.

How to get process PID for manual lock mechanism in Python?

I would like to make a simple locking mechanism in Python without having to rely on the existing libraries for locking (namely fcntl and probably others)
I already have a small stub, but after searching for a bit I couldn't find a good on answer on how to manually create the lock file and put the process PID inside. Here is my stub:
dir_name = "/var/lock/mycompany"
file_name = "myapp.pid"
lock = os.path.join(dir_name, file_name)
if os.path.exists(lock):
print >> sys.stderr, "already running under %s, exiting..." % lock
# display process PID contained in the file, not relevant to my question
sys.exit(ERROR_LOCK)
else:
# create the file 'lock' and put the process PID inside
How can I get the current process PID and put it inside the lock file? I thought about looking at /proc filesystem but that seems a bit too much for such a simple task.
Thanks.
http://docs.python.org/library/os.html#os.getpid
open(lock, 'w').write(os.getpid())
Don't neglect to convert the result of os.getpid() to a string with str(os.getpid()). write wants a string argument.

How to get environment from a subprocess?

I want to call a process via a python program, however, this process need some specific environment variables that are set by another process. How can I get the first process environment variables to pass them to the second?
This is what the program look like:
import subprocess
subprocess.call(['proc1']) # this set env. variables for proc2
subprocess.call(['proc2']) # this must have env. variables set by proc1 to work
but the to process don't share the same environment. Note that these programs aren't mine (the first is big and ugly .bat file and the second a proprietary soft) so I can't modify them (ok, I can extract all that I need from the .bat but it's very combersome).
N.B.: I am using Windows, but I prefer a cross-platform solution (but my problem wouldn't happen on a Unix-like ...)
Here's an example of how you can extract environment variables from a batch or cmd file without creating a wrapper script. Enjoy.
from __future__ import print_function
import sys
import subprocess
import itertools
def validate_pair(ob):
try:
if not (len(ob) == 2):
print("Unexpected result:", ob, file=sys.stderr)
raise ValueError
except:
return False
return True
def consume(iter):
try:
while True: next(iter)
except StopIteration:
pass
def get_environment_from_batch_command(env_cmd, initial=None):
"""
Take a command (either a single command or list of arguments)
and return the environment created after running that command.
Note that if the command must be a batch file or .cmd file, or the
changes to the environment will not be captured.
If initial is supplied, it is used as the initial environment passed
to the child process.
"""
if not isinstance(env_cmd, (list, tuple)):
env_cmd = [env_cmd]
# construct the command that will alter the environment
env_cmd = subprocess.list2cmdline(env_cmd)
# create a tag so we can tell in the output when the proc is done
tag = 'Done running command'
# construct a cmd.exe command to do accomplish this
cmd = 'cmd.exe /s /c "{env_cmd} && echo "{tag}" && set"'.format(**vars())
# launch the process
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, env=initial)
# parse the output sent to stdout
lines = proc.stdout
# consume whatever output occurs until the tag is reached
consume(itertools.takewhile(lambda l: tag not in l, lines))
# define a way to handle each KEY=VALUE line
handle_line = lambda l: l.rstrip().split('=',1)
# parse key/values into pairs
pairs = map(handle_line, lines)
# make sure the pairs are valid
valid_pairs = filter(validate_pair, pairs)
# construct a dictionary of the pairs
result = dict(valid_pairs)
# let the process finish
proc.communicate()
return result
So to answer your question, you would create a .py file that does the following:
env = get_environment_from_batch_command('proc1')
subprocess.Popen('proc2', env=env)
As you say, processes don't share the environment - so what you literally ask is not possible, not only in Python, but with any programming language.
What you can do is to put the environment variables in a file, or in a pipe, and either
have the parent process read them, and pass them to proc2 before proc2 is created, or
have proc2 read them, and set them locally
The latter would require cooperation from proc2; the former requires that the variables become known before proc2 is started.
Since you're apparently in Windows, you need a Windows answer.
Create a wrapper batch file, eg. "run_program.bat", and run both programs:
#echo off
call proc1.bat
proc2
The script will run and set its environment variables. Both scripts run in the same interpreter (cmd.exe instance), so the variables prog1.bat sets will be set when prog2 is executed.
Not terribly pretty, but it'll work.
(Unix people, you can do the same thing in a bash script: "source file.sh".)
You can use Process in psutil to get the environment variables for that Process.
If you want to implement it yourself, you can refer to the internal implementation of psutil. It adapts to some operating system.
Currently supported operating systems are:
AIX
FreeBSD, OpenBSD, NetBSD
Linux
macOS
Sun Solaris
Windows
Eg: In Linux platform, you can find one pid 7877 environment variables in file /proc/7877/environ, just open with rt mode to read it.
Of course the best way to do this is to:
import os
from typing import Dict
from psutil import Process
process = Process(pid=os.getpid())
process_env: Dict = process.environ()
print(process_env)
You can find other platform implementation in source code
Hope I can help you.
The Python standard module multiprocessing have a Queues system that allow you to pass pickle-able object to be passed through processes. Also processes can exchange messages (a pickled object) using os.pipe. Remember that resources (e.g : database connection) and handle (e.g : file handles) can't be pickled.
You may find this link interesting :
Communication between processes with multiprocessing
Also the PyMOTw about multiprocessing worth mentioning :
multiprocessing Basics
sorry for my spelling
Two things spring to mind: (1) make the processes share the same environment, by combining them somehow into the same process, or (2) have the first process produce output that contains the relevant environment variables, that way Python can read it and construct the environment for the second process. I think (though I'm not 100% sure) that there isn't any way to get the environment from a subprocess as you're hoping to do.
Environment is inherited from the parent process. Set the environment you need in the main script, not a subprocess (child).

Categories

Resources