I have this Python based service daemon which is doing a lot of multiplexed IO (select).
From another script (also Python) I want to query this service daemon about status/information and/or control the processing (e.g. pause it, shut it down, change some parameters, etc).
What is the best way to send control messages ("from now on you process like this!") and query processed data ("what was the result of that?") using python?
I read somewhere that named pipes might work, but don't know that much about named pipes, especially in python - and whether there are any better alternatives.
Both the background service daemon AND the frontend will be programmed by me, so all options are open :)
I am using Linux.
Pipes and Named pipes are good solution to communicate between different processes.
Pipes work like shared memory buffer but has an interface that mimics a simple file on each of two ends. One process writes data on one end of the pipe, and another reads that data on the other end.
Named pipes are similar to above , except that this pipe is actually associated with a real file in your computer.
More details at
http://www.softpanorama.org/Scripting/pipes.shtml
In Python, named pipe files are created with the os.mkfifo call
x = os.mkfifo(filename)
In child and parent open this pipe as file
out = os.open(filename, os.O_WRONLY)
in = open(filename, 'r')
To write
os.write(out, 'xxxx')
To read
lines = in.readline( )
Edit: Adding links from SO
Create a temporary FIFO (named pipe) in Python?
https://stackoverflow.com/search?q=python+named+pipes
You may want to read more on "IPC and Python"
http://www.freenetpages.co.uk/hp/alan.gauld/tutipc.htm
The best way to do IPC is using message Queue in python as bellow
server process server.py (run this before running client.py and interact.py)
from multiprocessing.managers import BaseManager
import Queue
queue1 = Queue.Queue()
queue2 = Queue.Queue()
class QueueManager(BaseManager): pass
QueueManager.register('get_queue1', callable=lambda:queue1)
QueueManager.register('get_queue2', callable=lambda:queue2)
m = QueueManager(address=('', 50000), authkey='abracadabra')
s = m.get_server()
s.serve_forever()
The inter-actor which is for I/O interact.py
from multiprocessing.managers import BaseManager
import threading
import sys
class QueueManager(BaseManager): pass
QueueManager.register('get_queue1')
QueueManager.register('get_queue2')
m = QueueManager(address=('localhost', 50000),authkey='abracadabra')
m.connect()
queue1 = m.get_queue1()
queue2 = m.get_queue2()
def read():
while True:
sys.stdout.write(queue2.get())
def write():
while True:
queue1.put(sys.stdin.readline())
threads = []
threadr = threading.Thread(target=read)
threadr.start()
threads.append(threadr)
threadw = threading.Thread(target=write)
threadw.start()
threads.append(threadw)
for thread in threads:
thread.join()
The client program Client.py
from multiprocessing.managers import BaseManager
import sys
import string
import os
class QueueManager(BaseManager): pass
QueueManager.register('get_queue1')
QueueManager.register('get_queue2')
m = QueueManager(address=('localhost', 50000), authkey='abracadabra')
m.connect()
queue1 = m.get_queue1()
queue2 = m.get_queue2()
class RedirectOutput:
def __init__(self, stdout):
self.stdout = stdout
def write(self, s):
queue2.put(s)
class RedirectInput:
def __init__(self, stdin):
self.stdin = stdin
def readline(self):
return queue1.get()
# redirect standard output
sys.stdout = RedirectOutput(sys.stdout)
sys.stdin = RedirectInput(sys.stdin)
# The test program which will take input and produce output
Text=raw_input("Enter Text:")
print "you have entered:",Text
def x():
while True:
x= raw_input("Enter 'exit' to end and some thing else to continue")
print x
if 'exit' in x:
break
x()
this can be used to communicate between two process in network or on same machine
remember that inter-actor and server process will not terminate until you manually kill it.
Related
Right now, I'm using subprocess to run a long-running job in the background. For multiple reasons (PyInstaller + AWS CLI) I can't use subprocess anymore.
Is there an easy way to achieve the same thing as below ? Running a long running python function in a multiprocess pool (or something else) and do real time processing of stdout/stderr ?
import subprocess
process = subprocess.Popen(
["python", "long-job.py"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True,
)
while True:
out = process.stdout.read(2000).decode()
if not out:
err = process.stderr.read().decode()
else:
err = ""
if (out == "" or err == "") and process.poll() is not None:
break
live_stdout_process(out)
Thanks
getting it cross platform is messy .... first of all windows implementation of non-blocking pipe is not user friendly or portable.
one option is to just have your application read its command line arguments and conditionally execute a file, and you get to use subprocess since you will be launching yourself with different argument.
but to keep it to multiprocessing :
the output must be logged to queues instead of pipes.
you need the child to execute a python file, this can be done using runpy to execute the file as __main__.
this runpy function should run under a multiprocessing child, this child must first redirect its stdout and stderr in the initializer.
when an error happens, your main application must catch it .... but if it is too busy reading the output it won't be able to wait for the error, so a child thread has to start the multiprocess and wait for the error.
the main process has to create the queues and launch the child thread and read the output.
putting it all together:
import multiprocessing
from multiprocessing import Queue
import sys
import concurrent.futures
import threading
import traceback
import runpy
import time
class StdoutQueueWrapper:
def __init__(self,queue:Queue):
self._queue = queue
def write(self,text):
self._queue.put(text)
def flush(self):
pass
def function_to_run():
# runpy.run_path("long-job.py",run_name="__main__") # run long-job.py
print("hello") # print something
raise ValueError # error out
def initializer(stdout_queue: Queue,stderr_queue: Queue):
sys.stdout = StdoutQueueWrapper(stdout_queue)
sys.stderr = StdoutQueueWrapper(stderr_queue)
def thread_function(child_stdout_queue,child_stderr_queue):
with concurrent.futures.ProcessPoolExecutor(1, initializer=initializer,
initargs=(child_stdout_queue, child_stderr_queue)) as pool:
result = pool.submit(function_to_run)
try:
result.result()
except Exception as e:
child_stderr_queue.put(traceback.format_exc())
if __name__ == "__main__":
child_stdout_queue = multiprocessing.Queue()
child_stderr_queue = multiprocessing.Queue()
child_thread = threading.Thread(target=thread_function,args=(child_stdout_queue,child_stderr_queue),daemon=True)
child_thread.start()
while True:
while not child_stdout_queue.empty():
var = child_stdout_queue.get()
print(var,end='')
while not child_stderr_queue.empty():
var = child_stderr_queue.get()
print(var,end='')
if not child_thread.is_alive():
break
time.sleep(0.01) # check output every 0.01 seconds
Note that a direct consequence of running as a multiprocess is that if the child runs into a segmentation fault or some unrecoverable error the parent will also die, hencing running yourself under subprocess might seem a better option if segfaults are expected.
I'm working with a Backend class which spawns a subprocess to perform the CPU-bound work. I have no control over that class and basically the only way of interaction is to create an instance backend = Backend() and submit work via backend.run(data) (this in turn submits the work to the subprocess and blocks until completion). Because these computations take quite some time, I'd like to perform them in parallel. Since the Backend class already spawns its own subprocess to perform the actual work, this appears to be an IO-bound situation.
So I thought about using multiple threads, each of which uses its own Backend instance. I could create these threads manually and connect them via queues. The following is an example implementation with some Backend mock class:
import os
import pty
from queue import Queue
from subprocess import PIPE, Popen
from threading import Thread
class Backend:
def __init__(self):
f, g = pty.openpty()
self.process = Popen(
['bash'], # example program
text=True, bufsize=1, stdin=PIPE, stdout=g)
self.write = self.process.stdin.write
self.read = os.fdopen(f).readline
def __enter__(self):
self.write('sleep 2\n') # startup work
return self
def __exit__(self, *exc):
self.process.stdin.close()
self.process.kill()
def run(self, x):
self.write(f'sleep {x} && echo "ok"\n') # perform work
return self.read().strip()
class Worker(Thread):
def __init__(self, inq, outq, **kwargs):
super().__init__(**kwargs)
self.inq = inq
self.outq = outq
def run(self):
with Backend() as backend:
while True:
data = self.inq.get()
result = backend.run(data)
self.outq.put((data, result))
task_queue = Queue()
result_queue = Queue()
n_workers = 3
threads = [Worker(task_queue, result_queue, daemon=True) for _ in range(n_workers)]
for thread in threads:
thread.start()
data = [2]*7
for x in data:
task_queue.put(x)
for _ in data:
print(f'Result ready: {result_queue.get()}')
Since the Backend needs to perform some work at startup, I don't want to create a new instance for each task. Hence each Worker creates one Backend instance for its whole life cycle. It's also important that each of the workers has its own backend, so they won't interfere with each other.
Now here's the question: Can I also use concurrent.futures.ThreadPoolExecutor to accomplish this? It looks like the Executor.map method would be the right candidate, but I can't figure out how to ensure that each worker receives its own instance of Backend (which needs to be persistent between tasks).
The state of worker threads can be saved in the global namespace, e.g. as a dict. Then threading.current_thread can be used to save/load the state for each of the workers. contextlib.ExitStack can be used to handle Backend appropriately as a context manager.
from concurrent.futures import ThreadPoolExecutor
from contextlib import ExitStack
import os
import pty
from subprocess import PIPE, Popen
import threading
class Backend:
...
backends = {}
exit_stack = ExitStack()
def init_backend():
backends[threading.current_thread()] = exit_stack.enter_context(Backend())
def compute(data):
return data, backends[threading.current_thread()].run(data)
with exit_stack:
with ThreadPoolExecutor(max_workers=3, initializer=init_backend) as executor:
for result in executor.map(compute, [2]*7):
print(f'Result ready: {result}')
(New to Python and OO - I apologize in advance if I'm being stupid here)
I'm trying to define a Python 3 class such that when an instance is created two subprocesses are also created. These subprocesses do some work in the background (sending and listening for UDP packets). The subprocesses also need to communicate with each other and with the instance (updating instance attributes based on what is received from UDP, among other things).
I am creating my subprocesses with os.fork because I don't understand how to use the subprocess module to send multiple file descriptors to child processes - maybe this is part of my problem.
The problem I am running into is how to kill the child processes when the instance is destroyed. My understanding is I shouldn't use destructors in Python because stuff should get cleaned up and garbage collected automatically by Python. In any case, the following code leaves the children running after it exits.
What is the right approach here?
import os
from time import sleep
class A:
def __init__(self):
sfp, pts = os.pipe() # senderFromParent, parentToSender
pfs, stp = os.pipe() # parentFromSender, senderToParent
pfl, ltp = os.pipe() # parentFromListener, listenerToParent
sfl, lts = os.pipe() # senderFromListener, listenerToSender
pid = os.fork()
if pid:
# parent
os.close(sfp)
os.close(stp)
os.close(lts)
os.close(ltp)
os.close(sfl)
self.pts = os.fdopen(pts, 'w') # allow creator of A inst to
self.pfs = os.fdopen(pfs, 'r') # send and receive messages
self.pfl = os.fdopen(pfl, 'r') # to/from sender and
else: # listener processes
# sender or listener
os.close(pts)
os.close(pfs)
os.close(pfl)
pid = os.fork()
if pid:
# sender
os.close(ltp)
os.close(lts)
sender(self, sfp, stp, sfl)
else:
# listener
os.close(stp)
os.close(sfp)
os.close(sfl)
listener(self, ltp, lts)
def sender(a, sfp, stp, sfl):
sfp = os.fdopen(sfp, 'r') # receive messages from parent
stp = os.fdopen(stp, 'w') # send messages to parent
sfl = os.fdopen(sfl, 'r') # received messages from listener
while True:
# send UDP packets based on messages from parent and process
# responses from listener (some responses passed back to parent)
print("Sender alive")
sleep(1)
def listener(a, ltp, lts):
ltp = os.fdopen(ltp, 'w') # send messages to parent
lts = os.fdopen(lts, 'w') # send messages to sender
while True:
# listen for and process incoming UDP packets, sending some
# to sender and some to parent
print("Listener alive")
sleep(1)
a = A()
Running the above produces:
Sender alive
Listener alive
Sender alive
Listener alive
...
Actually, you should use destructors. Python objects have a __del__ method, which is called just before the object is garbage-collected.
In your case, you should define
def __del__(self):
...
within your class A that sends the appropriate kill signals to your child processes. Don't forget to store the child PIDs in your parent process, of course.
As suggested here, you can create a child process using multiprocessing module with flag daemon=True.
Example:
from multiprocessing import Process
p = Process(target=f, args=('bob',))
p.daemon = True
p.start()
There's no point trying to reinvent the wheel. subprocess does all you want and more, though multiprocessing will simply the process, so we'll use that.
You can use multiprocessing.Pipe to create connections and can send messages back and forth between a pair of processes. You can make a pipe "duplex", so both ends can send and receive if that's what you need. You can use multiprocessing.Manager to create a shared Namespace between processes (sharing a state between listener, sender and parent). There is a warning with using multiprocessing.list, multiprocessing.dict or multiprocessing.Namespace. Any mutable object assigned to them will not see changes made to that object until it is reassigned to the managed object.
eg.
namespace.attr = {}
# change below not cascaded to other processes
namespace.attr["key"] = "value"
# force change to other processes
namespace.attr = namespace.attr
If you need to have more than one process write to the same attribute then you will need to use synchronisation to prevent concurrent modification by one processes wiping out changes made by another process.
Example code:
from multiprocessing import Process, Pipe, Manager
class Reader:
def __init__(self, writer_conn, namespace):
self.writer_conn = writer_conn
self.namespace = namespace
def read(self):
self.namespace.msgs_recv = 0
with self.writer_conn:
try:
while True:
obj = self.writer_conn.recv()
self.namespace.msgs_recv += 1
print("Reader got:", repr(obj))
except EOFError:
print("Reader has no more data to receive")
class Writer:
def __init__(self, reader_conn, namespace):
self.reader_conn = reader_conn
self.namespace = namespace
def write(self, msgs):
self.namespace.msgs_sent = 0
with self.reader_conn:
for msg in msgs:
self.reader_conn.send(msg)
self.namespace.msgs_sent += 1
def create_child_processes(reader, writer, msgs):
p_write = Process(target=Writer.write, args=(writer, msgs))
p_write.start()
# This is very important otherwise reader will hang after writer has finished.
# The order of this statement coming after p_write.start(), but after
# p_read.start() is also important. Look up file descriptors and how they
# are inherited by child processes on Unix and how a any valid fd to the
# write side of a pipe will keep all read ends open
writer.reader_conn.close()
p_read = Process(target=Reader.read, args=(reader,))
p_read.start()
return p_read, p_write
def run_mp_pipe():
manager = Manager()
namespace = manager.Namespace()
read_conn, write_conn = Pipe()
reader = Reader(read_conn, namespace)
writer = Writer(write_conn, namespace)
p_read, p_write = create_child_processes(reader, writer,
msgs=["hello", "world", {"key", "value"}])
print("starting")
p_write.join()
p_read.join()
print("done")
print(namespace)
assert namespace.msgs_sent == namespace.msgs_recv
if __name__ == "__main__":
run_mp_pipe()
Output:
starting
Reader got: 'hello'
Reader got: 'world'
Reader got: {'key', 'value'}
Reader has no more data to receive
done
Namespace(msgs_recv=3, msgs_sent=3)
I'm running two python threads (import threading). Both of them are blocked on a open() call; in fact they try to open named pipes in order to write in them, so it's a normal behaviour to block until somebody try to read from the named pipe.
In short, it looks like:
import threading
def f():
open('pipe2', 'r')
if __name__ == '__main__':
t = threading.Thread(target=f)
t.start()
open('pipe1', 'r')
When I type a ^C, the open() in the main thread is interrupted (raises IOError with errno == 4).
My problem is: the t threads still waits, and I'd like to propagate the interruption behaviour, in order to make it raise IOError too.
I found this in python docs:
"
... only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead.
"
Maybe you should also check these docs:
exceptions.KeyboardInterrupt
library/signal.html
One other idea is to use select to read the pipe asynchronously in the threads. This works in Linux, not sure about Windows (it's not the cleanest, nor the best implementation):
#!/usr/bin/python
import threading
import os
import select
def f():
f = os.fdopen(os.open('pipe2', os.O_RDONLY|os.O_NONBLOCK))
finput = [ f ]
foutput = []
# here the pipe is scanned and whatever gets in will be printed out
# ...as long as 'getout' is False
while finput and not getout:
fread, fwrite, fexcep = select.select(finput, foutput, finput)
for q in fread:
if q in finput:
s = q.read()
if len(s) > 0:
print s
if __name__ == '__main__':
getout = False
t = threading.Thread(target=f)
t.start()
try:
open('pipe1', 'r')
except:
getout = True
I need to debug a child process spawned by multiprocessing.Process(). The pdb degugger seems to be unaware of forking and unable to attach to already running processes.
Are there any smarter python debuggers which can be attached to a subprocess?
I've been searching for a simple to solution for this problem and came up with this:
import sys
import pdb
class ForkedPdb(pdb.Pdb):
"""A Pdb subclass that may be used
from a forked multiprocessing child
"""
def interaction(self, *args, **kwargs):
_stdin = sys.stdin
try:
sys.stdin = open('/dev/stdin')
pdb.Pdb.interaction(self, *args, **kwargs)
finally:
sys.stdin = _stdin
Use it the same way you might use the classic Pdb:
ForkedPdb().set_trace()
Winpdb is pretty much the definition of a smarter Python debugger. It explicitly supports going down a fork, not sure it works nicely with multiprocessing.Process() but it's worth a try.
For a list of candidates to check for support of your use case, see the list of Python Debuggers in the wiki.
This is an elaboration of Romuald's answer which restores the original stdin using its file descriptor. This keeps readline working inside the debugger. Besides, pdb special management of KeyboardInterrupt is disabled, in order it not to interfere with multiprocessing sigint handler.
class ForkablePdb(pdb.Pdb):
_original_stdin_fd = sys.stdin.fileno()
_original_stdin = None
def __init__(self):
pdb.Pdb.__init__(self, nosigint=True)
def _cmdloop(self):
current_stdin = sys.stdin
try:
if not self._original_stdin:
self._original_stdin = os.fdopen(self._original_stdin_fd)
sys.stdin = self._original_stdin
self.cmdloop()
finally:
sys.stdin = current_stdin
Building upon #memplex idea, I had to modify it to get it to work with joblib by setting the sys.stdin in the constructor as well as passing it directly along via joblib.
import os
import pdb
import signal
import sys
import joblib
_original_stdin_fd = None
class ForkablePdb(pdb.Pdb):
_original_stdin = None
_original_pid = os.getpid()
def __init__(self):
pdb.Pdb.__init__(self)
if self._original_pid != os.getpid():
if _original_stdin_fd is None:
raise Exception("Must set ForkablePdb._original_stdin_fd to stdin fileno")
self.current_stdin = sys.stdin
if not self._original_stdin:
self._original_stdin = os.fdopen(_original_stdin_fd)
sys.stdin = self._original_stdin
def _cmdloop(self):
try:
self.cmdloop()
finally:
sys.stdin = self.current_stdin
def handle_pdb(sig, frame):
ForkablePdb().set_trace(frame)
def test(i, fileno):
global _original_stdin_fd
_original_stdin_fd = fileno
while True:
pass
if __name__ == '__main__':
print "PID: %d" % os.getpid()
signal.signal(signal.SIGUSR2, handle_pdb)
ForkablePdb().set_trace()
fileno = sys.stdin.fileno()
joblib.Parallel(n_jobs=2)(joblib.delayed(test)(i, fileno) for i in range(10))
remote-pdb can be used to debug sub-processes. After installation, put the following lines in the code you need to debug:
import remote_pdb
remote_pdb.set_trace()
remote-pdb will print a port number which will accept a telnet connection for debugging that specific process. There are some caveats around worker launch order, where stdout goes when using various frontends, etc. To ensure a specific port is used (must be free and accessible to the current user), use the following instead:
from remote_pdb import RemotePdb
RemotePdb('127.0.0.1', 4444).set_trace()
remote-pdb may also be launched via the breakpoint() command in Python 3.7.
Just use PuDB that gives you an awesome TUI (GUI on terminal) and supports multiprocessing as follow:
from pudb import forked; forked.set_trace()
An idea I had was to create "dummy" classes to fake the implementation of the methods you are using from multiprocessing:
from multiprocessing import Pool
class DummyPool():
#staticmethod
def apply_async(func, args, kwds):
return DummyApplyResult(func(*args, **kwds))
def close(self): pass
def join(self): pass
class DummyApplyResult():
def __init__(self, result):
self.result = result
def get(self):
return self.result
def foo(a, b, switch):
# set trace when DummyPool is used
# import ipdb; ipdb.set_trace()
if switch:
return b - a
else:
return a - b
if __name__ == '__main__':
xml = etree.parse('C:/Users/anmendoza/Downloads/jim - 8.1/running-config.xml')
pool = DummyPool() # switch between Pool() and DummyPool() here
results = []
results.append(pool.apply_async(foo, args=(1, 100), kwds={'switch': True}))
pool.close()
pool.join()
results[0].get()
Here is the version of the ForkedPdb(Romuald's Solution) which will work for Windows and *nix based systems.
import sys
import pdb
import win32console
class MyHandle():
def __init__(self):
self.screenBuffer = win32console.GetStdHandle(win32console.STD_INPUT_HANDLE)
def readline(self):
return self.screenBuffer.ReadConsole(1000)
class ForkedPdb(pdb.Pdb):
def interaction(self, *args, **kwargs):
_stdin = sys.stdin
try:
if sys.platform == "win32":
sys.stdin = MyHandle()
else:
sys.stdin = open('/dev/stdin')
pdb.Pdb.interaction(self, *args, **kwargs)
finally:
sys.stdin = _stdin
The problem here is that Python always connects sys.stdin in the child process to os.devnull to avoid contention for the stream. But this means that when the debugger (or a simple input()) tries to connect to stdin to get input from the user, it immediately reaches end-of-file and reports an error.
One solution, at least if you don't expect multiple debuggers to run at the same time, is to reopen stdin in the child process. That can be done by setting sys.stdin to open(0), which always opens the active terminal. This in fact is what the ForkedPdb solution does, but it can be done more simply and in an os-independent manner like this:
import multiprocessing, sys
def main():
process = multiprocessing.Process(target=worker)
process.start()
process.join()
def worker():
# Python automatically closes sys.stdin for the subprocess, so we reopen
# stdin. This enables pdb to connect to the terminal and accept commands.
# See https://stackoverflow.com/a/30149635/3830997.
sys.stdin = open(0) # or os.fdopen(0)
print("Hello from the subprocess.")
breakpoint() # or import pdb; pdb.set_trace()
print("Exited from breakpoint in the subprocess.")
if __name__ == '__main__':
main()
If you are on a supported platform, try DTrace. Most of the BSD / Solaris / OS X family support DTrace.
Here is an intro by the author. You can use Dtrace to debug just about anything.
Here is a SO post on learning DTrace.