My python script needs to be killed every hour and after I need to restarted it. I need this to do because it's possible sometimes (I create screenshots) a browser window is hanging because of a user login popup or something.. Anyway. I created 2 files 'reload.py' and 'screenshot.py'. I run reload.py by cronjob.
I thought something like this would work
# kill process if still running
try :
os.system("killall -9 screenshotTaker");
except :
print 'nothing to kill'
# reload or start process
os.execl("/path/to/script/screenshots.py", "screenshotTaker")
The problem is, and what I read aswel the second argument of execl (the given process name) doesn't work? How can I set a process name for it to make the kill do it's work?
Thanks in advance!
The first argument to os.execl is the path to the executable. The remaining arguments are passed to that executable as if their where typed on the command-line.
If you want "screenshotTaker" become the name of the process, that is "screenshots.py" responsibility to do so. Do you do something special in that sense in that script?
BTW, a more common approach is to keep track (in /var/run/ usually) of the PID of the running program. And kill it by PID. This could be done with Python (using os.kill) At system-level, some distribution have helpers for that exact purpose. For example, on Debian there is start-stop-daemon. Here is a excerpt of the man:
start-stop-daemon(8) dpkg utilities start-stop-daemon(8)
NAME
start-stop-daemon - start and stop system daemon programs
SYNOPSIS
start-stop-daemon [options] command
DESCRIPTION
start-stop-daemon is used to control the creation and termination of
system-level processes. Using one of the matching options,
start-stop-daemon can be configured to find existing instances of a
running process.
Related
need some help, experts.
I found a topic in stackoverflow.com, clone process support in python.
So like it suggest, I call a function to create child process in new linux namespace.
def ns_clone(func):
stack = ctypes.c_char_p((' ' * STACK_SIZE).encode("utf_8"))
stack_top = ctypes.cast(stack, ctypes.c_void_p).value + STACK_SIZE
pid = libc.clone(ctypes.CFUNCTYPE(None)(func), ctypes.c_void_p(stack_top),
CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWUTS | signal.SIGCHLD)
if pid < 0:
libc.perror("clone")
It the clone is successful. And the new namespace also works fine. But I meet some problems:
In child process, I want to fork more jailed processes to serve. So I try to set the handler for SIGALRM and SIGCLD (I want to handle request timeout and perception grandson process terminated). But it never work. No error raised, but seems this python child process does not care what I set.
signal.signal(signal.SIGALRM, _timeout_callback)
signal.signal(signal.SIGCLD, _child_term_callback)
signal.alarm(15)
I have investigate it for several days. But I'm really a newbie in C language and OS level programming. Only clue I found is that if I use normal os.fork, this problem not exist. Another is in python document, it says the python signal handler is not original OS level handler, and the python handler always run in main thread of one process. But I'm not sure there're some relations.
Is it possible I can set some signal handlers in python process after linux clone syscall? Thanks for help !
My environment: Ubuntu 18.04 Desktop/CPython 3.7
Finally, I use python own "os.fork" to create sub process and use "unshare" to enter namespace.
I am using systemd on Raspbian to run a Python script script.py. The my.service file looks like this:
[Unit]
Description=My Python Script
Requires=other.service
[Service]
Restart=always
ExecStart=/home/script.py
ExecStop=/home/script.py
[Install]
WantedBy=multi-user.target
When the Required=other.service stops, I want my.service to stop immediately and also terminate Python process running script.py.
However, when trying this out by stopping other.service and then monitoring the state of my.service using systemctl, it seems like it takes good while for my.service to actually enter a 'failed' state (stopped). It seems that calling ExecStop to the script is not enough to terminate my.service itself and the subsequent script.py in a minute manner.
Just to be extra clear: I want the script to terminate pretty immediately in a way that is analogous to Ctrl + C. Basic Python clean-up is OK, but I don't want systemd to be waiting for a 'graceful' response time-out, or something like that.
Questions:
Is my interpretation of the delay correct, or is it just systemctl that is slow to update its status overview?
What is the recommendable way to stop the service and terminate the script. Should I include some sort of SIGINT catching in the Python script? If so, how? Or is there something that can be done in my.service to expedite the stopping of the service and killing of the script?
I think you should look into TimeoutStopSec and it's default value param DefaultTimeoutStartSec. On the priovided links, there are some more info about WatchdogSec and other options that you might find usefull. It looks like DefaultTimeoutStartSec's default is 90 seconds, which might be the delay you are experiencing..?
Under unit section options you should use Requisite=other.service This is similar to Requires= However, if the units listed here are not started already, they will not be started and the transaction will fail immediately.
For triggering script execution again under unit section you can use OnFailure= which is a space-separated list of one or more units that are activated when this unit enters the "failed" state.
Also using BindsTo= option configures requirement dependencies, very similar in style to Requires=, however in addition to this behavior, it also declares that this unit is stopped when any of the units listed suddenly disappears. Units can suddenly, unexpectedly disappear if a service terminates on its own choice, a device is unplugged or a mount point unmounted without involvement of systemd.
I think in your case BindsTo= is the option to use since it causes the current unit to stop when the associated unit terminates.
From systemd.unit man
I read How do you create a daemon in Python? and also this topic, and tried to write a very simple daemon :
import daemon
import time
with daemon.DaemonContext():
while True:
with open('a.txt', 'a') as f:
f.write('Hi')
time.sleep(2)
Doing python script.py works and returns immediately to terminal (that's the expected behaviour). But a.txt is never written and I don't get any error message. What's wrong with this simple daemon?
daemon.DaemonContext() has option working_directory that has default fault value / i.e. your program probably doesn't have permission to create a new file there.
The problem described here is solved by J.J. Hakala's answer.
Two additional (important) things :
Sander's code (mentioned here) is better than python-daemon. It is more reliable. Just one example: try to start two times the same daemon with python-daemon : big ugly error. With Sander's code : a nice notice "Daemon already running."
For those who want to use python-daemon anyway: DaemonContext() only makes a daemon. DaemonRunner() makes a daemon + control tool, allowing to do python script.py start or stop, etc.
One thing that's wrong with it, is it has no way to tell you what's wrong with it :-)
A daemon process is, by definition, detached from the parent process and from any controlling terminal. So if it's got something to say – such as error messages – it will need to arrange that before becoming a daemon.
From the python-daemon FAQ document:
Why does the output stop after opening the daemon context?
The specified behaviour in PEP 3143_ includes the requirement to
detach the process from the controlling terminal (to allow the process
to continue to run as a daemon), and to close all file descriptors not
known to be safe once detached (to ensure any files that continue to
be used are under the control of the daemon process).
If you want the process to generate output via the system streams
‘sys.stdout’ and ‘sys.stderr’, set the ‘DaemonContext’'s ‘stdout’
and/or ‘stderr’ options to a file-like object (e.g. the ‘stream’
attribute of a ‘logging.Handler’ instance). If these objects have file
descriptors, they will be preserved when the daemon context opens.
Set up a working channel of communication, such as a log file. Ensure the files you open aren't closed along with everything else, using the files_preserve option. Then log any errors to that channel.
I need to created a daemon in python. I did search and found a good piece of code. The daemon should be started automatically after system boots and it should be started if it was unexpectedly closed. I went through chapter about daemons in Advanced programming in the Unix environment and have two questions.
To run script automatically after the boot I need put my daemon script to /etc/init.d. Is that correct?
What should I do to respawn the daemon? According to the book I need add a respawn entry into /etc/inittab, but I don't have /etc/inittab on my system. Should I create it by myself?
I suggest you look into upstart if you're on Ubuntu. It's way better than inittab but does involve some learning curve to be honest.
Edit (by Blair): here is an adapted example of an upstart script I wrote for one of my own programs recently. A basic upstart script like this is fairly readable/understandable, though (like many such things) they can get complicated when you start doing fancy stuff.
description "mydaemon - my cool daemon"
# Start and stop conditions. Runlevels 2-5 are the
# multi-user (i.e, networked) levels. This means
# start the daemon when the system is booted into
# one of these runlevels and stop when it is moved
# out of them (e.g., when shut down).
start on runlevel [2345]
stop on runlevel [!2345]
# Allow the service to respawn automatically, but if
# crashes happen too often (10 times in 5 seconds)
# theres a real problem and we should stop trying.
respawn
respawn limit 10 5
# The program is going to daemonise (double-fork), and
# upstart needs to know this so it can track the change
# in PID.
expect daemon
# Set the mode the process should create files in.
umask 022
# Make sure the log folder exists.
pre-start script
mkdir -p -m0755 /var/log/mydaemon
end script
# Command to run it.
exec /usr/bin/python /path/to/mydaemon.py --logfile /var/log/mydaemon/mydaemon.log
To create a daemon, use double fork() as shown in the code you found.
Then you need to write an init script for your daemon and copy it into /etc/init.d/.
http://www.novell.com/coolsolutions/feature/15380.html
There are many ways to specify how the daemon will be auto-started, e.g., chkconfig.
http://linuxcommand.org/man_pages/chkconfig8.html
Or you can manually create the symlinks for certain runlevels.
Finally you need to restart the service when it unexpectedly exits. You may include a respawn entry for the serivce in /etc/inittab.
http://linux.about.com/od/commands/l/blcmdl5_inittab.htm
I am using a cluster of computers to do some parallel computation. My home directory is shared across the cluster. In one machine, I have a ruby code that creates bash script containing computation command and write the script to, say, ~/q/ directory. The scripts are named *.worker1.sh, *.worker2.sh, etc.
On other 20 machines, I have 20 python code running ( one at each machine ) that (constantly) check the ~/q/ directory and look for jobs that belong to that machine, using a python code like this:
jobs = glob.glob('q/*.worker1.sh')
[os.system('sh ' + job + ' &') for job in jobs]
For some additional control, the ruby code will create a empty file like workeri.start (i = 1..20) at q directory after it write the bash script to q directory, the python code will check for that 'start' file before it runs the above code. And in the bash script, if the command finishes successfully, the bash script will create an empty file like 'workeri.sccuess', the python code checks this file after it runs the above code to make sure the computation finishs successfully. If python finds out that the computation finishs successfully, it will remove the 'start' file in q directory, so the ruby code knows that job finishs successfully. After the 20 bash script all finished, the ruby code will create new bash script and python read and executes new scripts and so on.
I know this is not a elegant way to coordinate the computation, but I haven't figured out a better to communicate between different machines.
Now the question is: I expect that the 20 jobs will run somewhat in parallel. The total time to finish the 20 jobs will not be much longer than the time to finish one job. However, it seems that these jobs runs sequentially and time is much longer than I expected.
I suspect that part of the reason is that multiple codes are reading and writing the same directory at once but the linux system or python locks the directory and only allow one process to oprate the directory. This makes the code execute one at a time.
I am not sure if this is the case. If I split the bash scripts to different directories, and let the python code on different machines read and write different directories, will that solve the problem? Or is there any other reasons that cause the problem?
Thanks a lot for any suggestions! Let me know if I didn't explain anything clearly.
Some additional info:
my home directory is at /home/my_group/my_home, here is the mount info for it
:/vol/my_group on /home/my_group type nfs (rw,nosuid,nodev,noatime,tcp,timeo=600,retrans=2,rsize=65536,wsize=65536,addr=...)
I say constantly check the q directory, meaning a python loop like this:
While True:
if 'start' file exists:
find the scripts and execute them as I mentioned above
I know this is not a elegant way to coordinate the computation, but I
haven't figured out a better to communicate between different
machines.
While this isn't directly what you asked, you should really, really consider fixing your problem at this level, using some sort of shared message queue is likely to be a lot simpler to manage and debug than relying on the locking semantics of a particular networked filesystem.
The simplest solution to set up and run in my experience is redis on the machine currently running the Ruby script that creates the jobs. It should literally be as simple as downloading the source, compiling it and starting it up. Once the redis server is up and running, you change your code to append your the computation commands to one or more Redis lists. In ruby you would use the redis-rb library like this:
require "redis"
redis = Redis.new
# Your other code to build up command lists...
redis.lpush 'commands', command1, command2...
If the computations need to be handled by certain machines, use a list per-machine like this:
redis.lpush 'jobs:machine1', command1
# etc.
Then in your Python code, you can use redis-py to connect to the Redis server and pull jobs off the list like so:
from redis import Redis
r = Redis(host="hostname-of-machine-running-redis")
while r.llen('jobs:machine1'):
job = r.lpop('commands:machine1')
os.system('sh ' + job + ' &')
Of course, you could just as easily pull jobs off the queue and execute them in Ruby:
require 'redis'
redis = Redis.new(:host => 'hostname-of-machine-running-redis')
while redis.llen('jobs:machine1')
job = redis.lpop('commands:machine1')
`sh #{job} &`
end
With some more details about the needs of the computation and the environment it's running in, it would be possible to recommend even simpler approaches to managing it.
Try a while loop? If that doesn't work, on the python side try using a TRY statement like so:
Try:
with open("myfile.whatever", "r") as f:
f.read()
except:
(do something if it doesnt work, perhaps a PASS? (must be in a loop to constantly check this)
else:
execute your code if successful