Consider the following Python code:
import io
import time
import subprocess
import sys
from thread import start_new_thread
def ping_function(ip):
filename = 'file.log'
command = ["ping", ip]
with io.open(filename, 'wb') as writer, io.open(filename, 'rb', 1) as reader:
process = subprocess.Popen(command, stdout=writer)
while process.poll() is None:
line = reader.read()
# Do something with line
sys.stdout.write(line)
time.sleep(0.5)
# Read the remaining
sys.stdout.write(reader.read())
ping_function("google.com")
The goal is to run a shell command (in this case ping, but it is not relevant here) and to process the output in real time, which is also saved on a log file.
In other word, ping is running in background and it produces output on the terminal every second. My code will read this output (every 0.5 seconds), parse it and take some action in (almost) real time.
Realtime here means that I don't want to wait the end of the process to read the output. In this case actually ping never completes so an approach like the one I have just described is mandatory.
I have tested the code above and it actually works OK :)
Now I'd like to tun this in a separate thread, so I have replaced the last line with the following:
from thread import start_new_thread
start_new_thread(ping_function, ("google.com", ))
For some reason this does not work anymore, and the reader always return empty strings.
In particular, the string returned by reader.read() is always empty.
Using a Queue or another global variable is not going to help, because I am having problems even to retrieve the data in the first place (i.e. to obtain the output of the shell command)
My questions are:
How can I explain this behavior?
Is it a good idea to run a process inside a separate thread or I should use a different approach? This article suggests that it is not...
How can I fix the code?
Thanks!
You should never fork after starting threads. You can thread after starting a fork, so you can have a thread handle the I/O piping, but...
Let me repeat this: You should never fork after starting threads
That article explains it pretty well. You don't have control over the state of your program once you start threads. Especially in Python with things going on in the background.
To fix your code, just start the subprocess from the main thread, then start threading. It's perfectly OK to process the I/O from the pipes in a thread.
Related
I'm currently working on a POC with the following results to be desired
python script working as a parent, meaning it will start a child process while running it
the child process is oblivious to the fact another script is running it, the very same child script can also be executed as the main script by the user
comfortable way to read the subprocess's outputs (to sys.stdout via print), and the parent's inputs will be sent to the sys.stdin (via input)
I've already done some research on the topic and I am aware that I can pass to Popen/run subprocess.PIPE, and call it a day.
However I saw multiprocessing.Pipe() produces a linked socket pair which allows to send objects through them as a whole, so I don't need to get into when to stop reading a stream and continue afterward
# parent.py
import multiprocessing
import subprocess
import os
pipe1, pipe2 = multiprocessing.Pipe()
if os.fork():
while True:
print(pipe1.recv())
exit() # avoid fork colision
if os.fork():
# subprocess.run is busy wait
subprocess.run(args['python3', 'child.py'], stdin=pipe2.fileno(), stdout=pipe2.fileno())
exit() # avoid fork colision
while True:
user_input = input('> ')
pipe1.send(user_input)
# child.py
import os
import time
if os.fork:
while True:
print('child sends howdy')
time.sleep(1)
with open('child.txt, 'w') as file
while True:
user_input = input('> ')
# We supposedly can't write to sys.stdout because parent.py took control of it
file.write(f'{user_input}\n')
So to finally reach the essence of the problem, child.py is installed as a package,
meaning parent.py doesn't call on the actual file to run the script.
The subprocess is run by calling upon the package
And for some bizarre reason, when child.py is a package vs a script, the code written above doesn't seem to work.
child.py's sys.stdin and sys.stdout fail to work entirely, parent.py is unable to receive ANY of the child.py's prints (even sys.stdout.write(<some_data>) and sys.stdout.flush()),
and the same applies to sys.stdin.
If anyone can shed any light on how to solve it, I would be delighted !
Side Note
When calling upon a package, you don't call upon its main.py (image it's dunder_main_dunder.py) directly.
you call upon a python file which it actually starts up the package.
I assume something fishy might be happening over there when that happens and that what causes the interference, but that's just a theory
I'd appreciate some help with threading, which I pretty new to.
The example code is not exactly what I’m doing (‘notepad’ and ‘calc’ are just example commands), but a simplified version that shows my problem.
I want to run two seperate threads that each run a different command a number of times. I would like the code to do this:
Start the first instance of ‘notepad’ and ‘calc’ simultaneously
(which it does)
When I close an instance of ‘notepad’, to open the
next instance of ‘notepad’.
When I close an instance of ‘calc’, to
open the next instance of ‘calc’.
[edit] I want the script to wait until both threads have finished, as it needs to do some processing of the output from these.
However, when I close an instance of ‘notepad’, the next instance of ‘notepad’ does not start until I’ve closed the current instance of ‘calc’ and vice versa. With a bit of de-bugging, it looks like the process (from Popen) for the closed instance of 'notepad' doesn't finish until the current 'calc' is closed.
Running Python 2.7 on Windows 7
Example Code:
from subprocess import Popen, PIPE, STDOUT
from threading import Thread
def do_commands(command_list):
for command in command_list:
proc = Popen("cmd.exe", stdin=PIPE, stdout=PIPE, stderr=STDOUT)
stdout_value, stderr_value = proc.communicate(input=command)
# MAIN CODE
A_command_list = ["notepad\n", "notepad\n", "notepad\n" ]
B_command_list = ["calc\n", "calc\n", "calc\n" ]
A_args = [A_command_list]
B_args = [B_command_list]
A_thread = Thread(target=do_commands, args=(A_args))
B_thread = Thread(target=do_commands, args=(B_args))
A_thread.start()
B_thread.start()
A_thread.join()
B_thread.join()
Thanks in advance :-)
Nick
So the communicate() method is apparently waiting for all processes created by Popen and executing cmd.exe and started at nearly the same time to terminate. Since the cmd.exe that runs calculator starts at nearly the same time as the cmd.exe that runs Notepad, both communicate() calls (one in A_thread and one in B_thread) wait until both processes term. Thus neither for loop advances until both processes term.
Adding a delay between starting the two threads fixes the problem.
So, leaving your original code unchanged and adding
sleep(1)
between the two Thread starts produces the desired behavior.
On my system, adding a delay of 0.0001 seconds reliably fixed the problem whereas a delay of 0.00001 did not.
I have two scripts in Python.
sub.py code:
import time
import subprocess as sub
while 1:
value=input("Input some text or number") # it is example, and I don't care about if it is number-input or text-raw_input, just input something
proces=sub.Popen(['sudo', 'python', '/home/pi/second.py'],stdin=sub.PIPE)
proces.stdin.write(value)
second.py code:
import sys
while 1:
from_sub=sys.stdin()#or sys.stdout() I dont remember...
list_args.append(from_sub) # I dont know if syntax is ok, but it doesn't matter
for i in list_arg:
print i
First I execute sub.py, and I input something, then second.py file will execute and printing everything what I inputed and again and again...
The thing is I don't want to open new process. There should be only one process. Is it possible?
Give me your hand :)
This problem can be solved by using Pexpect. Check my answer over here. It solves a similar problem
https://stackoverflow.com/a/35864170/5134525.
Another way to do that is to use Popen from subprocess module and setting stdin and stdout as pipe. Modifying your code a tad bit can give you the desired results
from subprocess import Popen, PIPE
#part which should be outside loop
args = ['sudo', 'python', '/home/pi/second.py']
process = Popen(args, stdin=PIPE, stdout=PIPE)
while True:
value=input("Input some text or number")
process.stdin.write(value)
You need to open the process outside the loop for this to work. A similar issue is addressed here in case you want to check that Keep a subprocess alive and keep giving it commands? Python
This approach will lead to error if child process quits after first iteration and close all the pipes. You somehow need to block the child process to accept more input. This you can do by either using threads or by using the first option i.e. Pexpect
I have a set of command line tools that I'd like to run in parallel on a series of files. I've written a python function to wrap them that looks something like this:
def process_file(fn):
print os.getpid()
cmd1 = "echo "+fn
p = subprocess.Popen(shlex.split(cmd1))
# after cmd1 finishes
other_python_function_to_do_something_to_file(fn)
cmd2 = "echo "+fn
p = subprocess.Popen(shlex.split(cmd2))
print "finish"
if __name__=="__main__":
import multiprocessing
p = multiprocessing.Pool()
for fn in files:
RETURN = p.apply_async(process_file,args=(fn,),kwds={some_kwds})
While this works, it does not seem to be running multiple processes; it seems like it's just running in serial (I've tried using Pool(5) with the same result). What am I missing? Are the calls to Popen "blocking"?
EDIT: Clarified a little. I need cmd1, then some python command, then cmd2, to execute in sequence on each file.
EDIT2: The output from the above has the pattern:
pid
finish
pid
finish
pid
finish
whereas a similar call, using map in place of apply (but without any provision for passing kwds) looks more like
pid
pid
pid
finish
finish
finish
However, the map call sometimes (always?) hangs after apparently succeeding
Are the calls to Popen "blocking"?
No. Just creating a subprocess.Popen returns immediately, giving you an object that you could wait on or otherwise use. If you want to block, that's simple:
subprocess.check_call(shlex.split(cmd1))
Meanwhile, I'm not sure why you're putting your args together into a string and then trying to shlex them back to a list. Why not just write the list?
cmd1 = ["echo", fn]
subprocess.check_call(cmd1)
While this works, it does not seem to be running multiple processes; it seems like it's just running in serial
What makes you think this? Given that each process just kicks off two processes into the background as fast as possible, it's going to be pretty hard to tell whether they're running in parallel.
If you want to verify that you're getting work from multiple processing, you may want to add some prints or logging (and throw something like os.getpid() into the messages).
Meanwhile, it looks like you're trying to exactly duplicate the effects of multiprocessing.Pool.map_async out of a loop around multiprocessing.Pool.apply_async, except that instead of accumulating the results you're stashing each one in a variable called RESULT and then throwing it away before you can use it. Why not just use map_async?
Finally, you asked whether multiprocessing is the right tool for the job. Well, you clearly need something asynchronous: check_call(args(file1)) has to block other_python_function_to_do_something_to_file(file1), but at the same time not block check_call(args(file2)).
I would probably have used threading, but really, it doesn't make much difference. Even if you're on a platform where process startup is expensive, you're already paying that cost because the whole point is running N * M bunch of child processes, so another pool of 8 isn't going to hurt anything. And there's little risk of either accidentally creating races by sharing data between threads, or accidentally creating code that looks like it shares data between processes that doesn't, since there's nothing to share. So, whichever one you like more, go for it.
The other alternative would be to write an event loop. Which I might actually start doing myself for this problem, but I'd regret it, and you shouldn't do it…
I have been trying to write an application that runs subprocesses and (among other things) displays their output in a GUI and allows the user to click a button to cancel them. I start the processes like this:
queue = Queue.Queue(500)
process = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(process.stdout, queue))
iothread.daemon=True
iothread.start()
where simple_io_thread is defined as follows:
def simple_io_thread(pipe, queue):
while True:
line = pipe.readline()
queue.put(line, block=True)
if line=="":
break
This works well enough. In my UI I periodically do non-blocking "get"s from the queue. However, my problems come when I want to terminate the subprocess. (The subprocess is an arbitrary process, not something I wrote myself.) I can use the terminate method to terminate the process, but I do not know how to guarantee that my I/O thread will terminate. It will normally be doing blocking I/O on the pipe. This may or may not end some time after I terminate the process. (If the subprocess has spawned another subprocess, I can kill the first subprocess, but the second one will still keep the pipe open. I'm not even sure how to get such grand-children to terminate cleanly.) After that the I/O thread will try to enqueue the output, but I don't want to commit to reading from the queue indefinitely.
Ideally I would like some way to request termination of the subprocess, block for a short (<0.5s) amount of time and after that be guaranteed that the I/O thread has exited (or will exit in a timely fashion without interfering with anything else) and that I can stop reading from the queue.
It's not critical to me that a solution uses an I/O thread. If there's another way to do this that works on Windows and Linux with Python 2.6 and a Tkinter GUI that would be fine.
EDIT - Will's answer and other things I've seen on the web about doing this in other languages suggest that the operating system expects you just to close the file handle on the main thread and then the I/O thread should come out of its blocking read. However, as I described in the comment, that doesn't seem to work for me. If I do this on the main thread:
process.stdout.close()
I get:
IOError: close() called during concurrent operation on the same file object.
...on the main thread. If I do this on the main thread:
os.close(process.stdout.fileno())
I get:
close failed in file object destructor: IOError: [Errno 9] Bad file descriptor
...later on in the main thread when it tries to close the file handle itself.
I know this is an old post, but in case it still helps anyone, I think your problem could be solved by passing the subprocess.Popen instance to io_thread, rather than it's output stream.
If you do that, then you can replace your while True: line with while process.poll() == None:.
process.poll() checks for the subprocess return code; if the process hasn't finished, then there isn't one (i.e. process.poll() == None). You can then do away with if line == "": break.
The reason I'm here is because I wrote a very similar script to this today, and I got those:-
IOError: close() called during concurrent operation on the same file object. errors.
Again, in case it helps, I think my problems stem from (my) io_thread doing some overly efficient garbage collection, and closes a file handle I give it (I'm probably wrong, but it works now..) Mine's different tho in that it's not daemonic, and it iterates through subprocess.stdout, rather than using a while loop.. i.e.:-
def io_thread(subprocess,logfile,lock):
for line in subprocess.stdout:
lock.acquire()
print line,
lock.release()
logfile.write( line )
I should also probably mention that I pass the bufsize argument to subprocess.Popen, so that it's line buffered.
This is probably old enough, but still usefull to someone coming from search engine...
The reason that it shows that message is that after the subprocess has been completed it closes the file descriptors, therefore, the daemon thread (which is running concurrently) will try to use those closed descriptors raising the error.
By joining the thread before the subprocess wait() or communicate() methods should be more than enough to suppress the error.
my_thread.join()
print my_thread.is_alive()
my_popen.communicate()
In the code that terminates the process, you could also explicitly os.close() the pipe that your thread is reading from?
You should close the write pipe instead... but as you wrote the code you cannot access to it. To do it you should
crate a pipe
pass the write pipe file id to Popen's stdout
use the read pipe file simple_io_thread to read lines.
Now you can close the write pipe and the read thread will close gracefully.
queue = Queue.Queue(500)
r, w = os.pipe()
process = subprocess.Popen(
command,
stdout=w,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(os.fdopen(r), queue))
iothread.daemon=True
iothread.start()
Now by
os.close(w)
You can close the pipe and iothread will shutdown without any exception.