basically i am writing a script to automate a process , in script i pass a command which creates a child processes (parallely) and my script does not have control over that child process. i want to provide a series of input to the child process and then wait for it to end.
a dummy code for what i want to do:
import asyncio
async def main():
process = await asyncio.create_subprocess_exec('python3', stdin=asyncio.subprocess.PIPE)
print('hi1')
process.stdin.write("import time \nprint('whatsup') \nfo = open('foo.txt', 'w') \nfo.write('Python is a great language.') \nfo.close() \ntime.sleep(3) \nprint('yo')".encode("utf-8"))
print('end')
await process.wait()
asyncio.run(main())
the program never ends in my case
couldnt find a better way to communicate with the child process
what works is to do
process.stdin.close()
before the await call.
Full code that works for me (replaced python3 by py, which is the python startup on windows for me)
import asyncio
async def main():
process = await asyncio.create_subprocess_exec('py', stdin=asyncio.subprocess.PIPE)
print('hi1')
process.stdin.write("import time \nprint('whatsup') \nfo = open('foo.txt', 'w') \nfo.write('Python is a great language.') \nfo.close() \ntime.sleep(3) \nprint('yo')".encode("utf-8"))
print('end')
process.stdin.close()
await process.wait()
asyncio.run(main())
output
hi1
end
whatsup
yo
and the foo.txt file is created.
This answer has analyzed python source code for communicate and it seems that it also closes standard input to make it work.
in each case _stdin_write() calls self.stdin.close() after self.stdin.write(input)
https://github.com/python/cpython/blob/master/Lib/subprocess.py#L793
I don't remember where I learned that trick myself or why it is needed but that solves it. The limitation is that you cannot send multiple separate commands that way and interact with the process.
Related
I'm experimenting mit named pipes and async approaches and was a bit surprised, how slow reading the file I've created seems to be.
And as this question suggests, this effect is not limited to named pipes as in the example below but applies to 'normal' files as well. Since my final goal is reading those named pipes I prefer to keep the examples below.
So here is what I initially came up with:
import sys, os
from asyncio import create_subprocess_exec, gather, run
from asyncio.subprocess import DEVNULL
from aiofile import async_open
async def read_strace(namedpipe):
with open("async.log", "w") as outfp:
async with async_open(namedpipe, "r") as npfp:
async for line in npfp:
outfp.write(line)
async def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = await create_subprocess_exec(
"strace", "-o", "myfifo", *cmd,
stdout=DEVNULL, stderr=DEVNULL)
await gather(read_strace("myfifo"), process.wait())
finally:
os.unlink("myfifo")
run(main(sys.argv[1:]))
You can run it like ./sync_program.py <CMD> e.g. ./sync_program.py find .
This one uses default Popen and reads what strace writes to myfifo:
from subprocess import Popen, DEVNULL
import sys, os
def read_strace(namedpipe):
with open("sync.log", "w") as outfp:
with open(namedpipe, "r") as npfp:
for line in npfp:
outfp.write(line)
def main(cmd):
try:
myfifo = os.mkfifo('myfifo', 0o600)
process = Popen(
["strace", "-o", "myfifo", *cmd],
stdout=DEVNULL, stderr=DEVNULL)
read_strace("myfifo"),
finally:
os.unlink("myfifo")
main(sys.argv[1:])
Running both programs with time reveals that the async program is about 15x slower:
$ time ./async_program.py find .
poetry run ./async_program.py find . 4.06s user 4.75s system 100% cpu 8.727 total
$ time ./sync_program.py find .
poetry run ./sync_program.py find . 0.27s user 0.07s system 76% cpu 0.438 total
The linked question suggests that aiofile is known to be somehow slow, but 15x? I'm pretty sure that I still come close to the synchronous approach by using an extra thread and writing to a queue, but admittedly I didn't try it yet.
Is there a recommended way to read a file asynchronously - maybe even an approach more dedicated to named pipes as I use them in the given example?
So async isn't magic. What async is good at is when you are calling something or something is calling you, usually remotely, and there is I/O overhead and delays because of the network, file I/O, etc.
In your case, there won't be any I/O wait having a single process reading a single file (named pipe or not).
So ALL your async is doing here is ADDING overhead to the process to put it into an event loop and release back to the loop repeatedly.
I'm currently working on a POC with the following results to be desired
python script working as a parent, meaning it will start a child process while running it
the child process is oblivious to the fact another script is running it, the very same child script can also be executed as the main script by the user
comfortable way to read the subprocess's outputs (to sys.stdout via print), and the parent's inputs will be sent to the sys.stdin (via input)
I've already done some research on the topic and I am aware that I can pass to Popen/run subprocess.PIPE, and call it a day.
However I saw multiprocessing.Pipe() produces a linked socket pair which allows to send objects through them as a whole, so I don't need to get into when to stop reading a stream and continue afterward
# parent.py
import multiprocessing
import subprocess
import os
pipe1, pipe2 = multiprocessing.Pipe()
if os.fork():
while True:
print(pipe1.recv())
exit() # avoid fork colision
if os.fork():
# subprocess.run is busy wait
subprocess.run(args['python3', 'child.py'], stdin=pipe2.fileno(), stdout=pipe2.fileno())
exit() # avoid fork colision
while True:
user_input = input('> ')
pipe1.send(user_input)
# child.py
import os
import time
if os.fork:
while True:
print('child sends howdy')
time.sleep(1)
with open('child.txt, 'w') as file
while True:
user_input = input('> ')
# We supposedly can't write to sys.stdout because parent.py took control of it
file.write(f'{user_input}\n')
So to finally reach the essence of the problem, child.py is installed as a package,
meaning parent.py doesn't call on the actual file to run the script.
The subprocess is run by calling upon the package
And for some bizarre reason, when child.py is a package vs a script, the code written above doesn't seem to work.
child.py's sys.stdin and sys.stdout fail to work entirely, parent.py is unable to receive ANY of the child.py's prints (even sys.stdout.write(<some_data>) and sys.stdout.flush()),
and the same applies to sys.stdin.
If anyone can shed any light on how to solve it, I would be delighted !
Side Note
When calling upon a package, you don't call upon its main.py (image it's dunder_main_dunder.py) directly.
you call upon a python file which it actually starts up the package.
I assume something fishy might be happening over there when that happens and that what causes the interference, but that's just a theory
I'm running a program via subprocess.Popen and running into an unexpected issue wherein the stdout does not print live, and instead waits for the program to finish. Oddly enough, this only occurs when the program being called is written in Python.
My control program (the one using subprocess) is as follows:
import subprocess
import os
print("Warming up")
pop = subprocess.Popen("python3 signaler.py", shell=True,stdout=subprocess.PIPE,stdin=subprocess.PIPE)
for line in pop.stdout:
print(line)
print("I'm done")
Signaler.py:
import time
print("Running")
time.sleep(5)
print("Done running")
When I run the control program, output is as follows. The code waits 5 seconds before printing Running, despite the fact that print("Running") occurs before the actual delay.
Warming up
*waits 5 seconds*
b'Running\n'
b'123\n'
I'm done
The strange thing is that when I modify the control program to instead run a Node program, the delay functions as expected and Running is printed 5 seconds before Done Running. The Node program is as follows:
const delay = require("delay")
async function run(){
console.log("Running")
await delay(5000)
console.log("Done running")
}
run()
This issue doesn't occur when I use os.system to call signaler.py, and still occurs when I run shell=False and modify the arguments as such. Any ideas what's causing this?
It sounds like python is buffering the output of the print function and javascript isn't. You can force print statements to be flushed to stdout by calling it with the flush keyword in signaler.py
print("Running", flush=True)
I read the question/answer/comments on A non-blocking read on a subprocess.PIPE in Python, but I felt a bit lacking.
When I implemented the solution provided, I noticed that this approach works best when the sub-process ends on it own. But if the subprocess is providing a stream of information and we are looking for a single match of output, then that approach doesn't work for my needs (specifically for Windows, if that matters).
Here is my sample:
File ping.py
import time
def main():
for x in range(100):
print x
time.sleep(1)
if __name__ == '__main__':
print("Starting")
time.sleep(2)
main()
File runner.py
import subprocess
import time
import sys
from Queue import Queue, Empty
from threading import Thread
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
# Start process we want to listen to
pPing = subprocess.Popen('ping.py',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
q = Queue()
t = Thread(target=enqueue_output, args=(pPing.stdout, q))
t.daemon = True
t.start()
# Make sure it's started
print ("get the first line")
try:
line = q.get()
except Empty:
pass
else:
print line.strip()
#look for the 'magic' output
print("empty the queue")
while not q.empty():
line = q.get_nowait().strip()
if (line == "3"):
print("got it!")
sys.exit()
else:
print("not yet")
My expectation is that the runner will make sure the process is started and then wait for the magic output and then stop, which it does. However, the longer the sub-process runs, the longer the runner runs. But since the 'magic' output comes relatively quickly, I have to wait until the subprocess ends before I get anything processed.
What am I missing?
OK, if I understand correctly what you are trying to do, the problem is with ping still being a child process to runner. While you can make read calls non-blocking, the parent process will not actually exit while the child is still running. If you want runner not to wait for the child to finish, read the first line and the first magic output and then exit; you need ping to disassociate itself from the parent process.
Look at this code sample to see how that is done A simple Unix/Linux daemon in Python. Of course you might skip the part where they close and re-open all the I/O streams.
On the same note, I am not sure leaving an open I/O stream connected to the parent will allow the parent to exit, so if that happens to be a problem you might have to figure out another way to exchange data.
Consider the following Python code:
import io
import time
import subprocess
import sys
from thread import start_new_thread
def ping_function(ip):
filename = 'file.log'
command = ["ping", ip]
with io.open(filename, 'wb') as writer, io.open(filename, 'rb', 1) as reader:
process = subprocess.Popen(command, stdout=writer)
while process.poll() is None:
line = reader.read()
# Do something with line
sys.stdout.write(line)
time.sleep(0.5)
# Read the remaining
sys.stdout.write(reader.read())
ping_function("google.com")
The goal is to run a shell command (in this case ping, but it is not relevant here) and to process the output in real time, which is also saved on a log file.
In other word, ping is running in background and it produces output on the terminal every second. My code will read this output (every 0.5 seconds), parse it and take some action in (almost) real time.
Realtime here means that I don't want to wait the end of the process to read the output. In this case actually ping never completes so an approach like the one I have just described is mandatory.
I have tested the code above and it actually works OK :)
Now I'd like to tun this in a separate thread, so I have replaced the last line with the following:
from thread import start_new_thread
start_new_thread(ping_function, ("google.com", ))
For some reason this does not work anymore, and the reader always return empty strings.
In particular, the string returned by reader.read() is always empty.
Using a Queue or another global variable is not going to help, because I am having problems even to retrieve the data in the first place (i.e. to obtain the output of the shell command)
My questions are:
How can I explain this behavior?
Is it a good idea to run a process inside a separate thread or I should use a different approach? This article suggests that it is not...
How can I fix the code?
Thanks!
You should never fork after starting threads. You can thread after starting a fork, so you can have a thread handle the I/O piping, but...
Let me repeat this: You should never fork after starting threads
That article explains it pretty well. You don't have control over the state of your program once you start threads. Especially in Python with things going on in the background.
To fix your code, just start the subprocess from the main thread, then start threading. It's perfectly OK to process the I/O from the pipes in a thread.