Airflow webserver launched via subprocess not dying on kill

Airflow webserver launched via subprocess not dying on kill - python

Using Python 3.6.1. I am emulating launching an airflow webserver from the command as a process using subprocess.Popen.
After doing some things, I later move to kill (or terminate) it.
webserver_process = subprocess.Popen(["airflow", "webserver"])
webserver_process.kill()
My understanding is that this will send a SIGKILL to the webserver, whose underlying gunicorn should shutdown immediately.
However, when I navigate to http://localhost:8080 I see that the webserver is still running. Similarly when I then run sudo netstat -nlp|grep 8080 (I am using UNIX, and airflow webserver launches on port 8080), I discover:
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN
It's only when I kill the process manually using sudo fuser -k 8080/tcp that it finally dies.
What's going on here?

The python process returned by airflow webserver command actually calls subprocess.Popen to start gunicorn in a subprocess.
You can test this by calling webserver_process.pid, you'll notice that it's a different pid from the gunicorn master process pid.

Related

How to interrupt child process running on a remote machine started by Popen on a host

For my setup, I have a host machine and a remote machine, such that I have direct ssh access from the host machine to the remote one.
I'm using the host machine to start up (and possibly stop) a server, which is a long running process. For that I use subprocess.Popen in which the command looks something like this:
ssh remote_machine "cd dir ; python3 long_running_process.py"
via
p = subprocess.Popen(['ssh', 'remote_machine', 'cd dir ; python3 long_running_process.py'])
From what I gathered, although for the Popen call, we have shell=False, this would anyway enable the ssh process to run the cd and python processes under a shell like bash.
The problem arises when I want to stop this process, or more crucially when an Exception is raised in the long running process, to clean up and stop all processes on the host and most importantly on the remote machine.
Therefore, terminating the Popen process on the host machine does not suffice (actually, I send a SIGINT so that I could catch it on the remote side, but doesn't work), as the long running process still runs on the remote machine.
So if it actually occured that an exception was raised by THE CHILDREN PROCESSES of the long running process, then the long running process itself is not stopped.
Should I have to ssh again to stop the processes? (though I don't know the children's PIDs upfront)

Why does the PID change when "ssh -f -N hostname" is called using subprocess in Python, and how can I terminate it reliably when my program ends?

I need to connect to a target device over a proxy to execute some comands on the target. To do this, I need to open an SSH tunnel to the proxy, and then use a Python library to interact with the target over SSH. The library is not capable of accommodating proxied connections. This concept works when I use my shell directly to bring up the tunnel and then use the Python library to interact with the target. I now need to move the shell command into my Python program.
I tried opening an SSH tunnel using subprocess with the following code:
config_file = "path/to/config"
cmd = shlex.split(f"ssh -f -N jumphost-tunnel -F {config_file}")
process = subprocess.Popen(
cmd, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
)
This creates two problems.
Problem 1
When I call process.pid the PID is different from what I see when I execute ps aux | grep ssh and note the PID on the OS. It is off by 1 (i.e: PID from subprocess.pid is 44196 PID from ps aux is 44197).
I would like to understand why the PID is off by 1. Is it due to the SSH process being placed in the background when called with ssh -f?
Problem 2
It leaves a zombie SSH tunnel behind, as I cannot terminate the tunnel with subprocess.kill() due to not knowing the PID of the tunnel command.
How can I safely, and reliably terminate the SSH tunnel when the program completes?
For some background, I need to tunnel to a proxy server and execute a command on a target device over SSH. The target device is a Juniper SRX. I'm using the PyEZ-junos library to interact with it. The library uses Paramiko under the hood to interact with the Junos device, but the library implementation does not make use of the ProxyCommand or ProxyJump directives made available by OpenSSH, hence the call to subprocess to initiate the tunnel to the proxy server. I don't want to change the internals of the PyEZ library to fix the tunneling issue.

I haven't checked, but it would surprise me if the off-by-one PID is not caused by ssh “backgrounding” itself by forking a new process and letting the original process exit.
I don't think you really need the -f flag. subprocess.Popen starts a new process anyway.

edX LMS port 8000 already in use (even after killing processes)

I got the following issue when I try to run my edX LMS (port 8000):
Error: That port is already in use
So in my vagrant account I found and did kill -9 on process which was using 8000. But as soon as I killed them, the process was automatically restarting and using port 8000 and I am unable to run LMS.

When that happens, I just do:
vagrant reload
(You will have to logout from SSH before by typing logout)
It is equivalent to:
vagrant halt
vagrant up

I've had times on OS/X with Vagrant where I've had to kill not only the vagrant process, but also virtualbox, when vagrant reload hasn't worked.
On your machine (not the guest VM):
ps -eaf | fgrep -i vagrant
ps -eaf | fgrep -i virtualbox
Then kill all those processes and "vagrant up."

vagrant halt is enough to kill all the processes related to the used port.

Supervisord process control - stopping a single subprocess

We're using Supervisord to run workers started by our Gearman job server. To remove a job from the queue, we have to run :
$ sudo killall supervisord
as to kill all Supervisord subprocesses so the job doesn't spawn when removed, then
$ gearman -n -w -f FUNCTION_NAME > /dev/null
to remove the job completley from the server.
Is there a way to kill only one Supervisord subprocess instead of using killall? For instance, if we have multiple jobs running and a single job is running longer than it should, or starts throwing errors, how can we kill the subprocess and remove the job from the server without killing all subprocesses?

Yes: Use supervisorctl to interact with supervisord. If you need to do so programmatically, there's a web service interface.

fabric don't start twisted application as a daemon

I have written a simple automation script for deploying and restarting my twisted application on remote Debian host. But I have an issue with starting using twistd.
I have a run.tac file and start my application as follows inside fabric task:
#task
def start():
run("twistd -y run.tac")
And then just fab -H host_name start. It works great on localhost but when I want to start application on remote host I get nothing. I can see in log file that application is actually launched, but factory is not started. I've also checked netstat -l - nothing is listening my port.
I've tried to run in non-daemon mode, like so twistd -ny run.tac, and, voila, factory started and I can see it in netstat -l on remote host. But that is not the way I want it to work cause it. Any help is appreciated.

There was an issue reported sometime back which is similar to this.
Init scripts frequently fail to start their daemons
init-scripts-dont-work
It also suggested that it seems to succeed with the option pty=False. Can you try and check that?
run("twistd -y run.tac", pty=False)
Some more pointers from the FaQ:
why-can-t-i-run-programs-in-the-background-with-it-makes-fabric-hang

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.