Python input() blocks subprocesses from executing - python

I have a Python script that accepts user input. Different user inputs trigger different functionality. The functionality in question here is one that spawns multiple processes. Here is the script, main.py.
import time
import threading
import concurrent.futures as cf
def executeparallelprocesses():
numprocesses = 2
durationseconds = 10
futures = []
print('Submit jobs as new processes.')
with cf.ProcessPoolExecutor(max_workers=numprocesses) as executor:
for i in range(numprocesses):
futures.append(executor.submit(workcpu, 500, durationseconds))
print('job submitted')
print('all jobs submitted')
print('Wait for jobs to complete.', flush=True)
for future in cf.as_completed(futures):
future.result()
print('All jobs done.', flush=True)
def workcpu(x, durationseconds):
print('Job executing in new process.')
start = time.time()
while time.time() - start < durationseconds:
x * x
def main():
while True:
cmd = input('Press ENTER\n')
if cmd == 'q':
break
thread = threading.Thread(target=executeparallelprocesses)
thread.start()
time.sleep(15)
if __name__ == '__main__':
main()
When this script is invoked from the terminal, it works as expected (i.e., the subprocesses execute). Specifically, notice the two lines "Job executing in new process." in the example run that follows:
(terminal prompt $) python3 main.py
Press ENTER
Submit jobs as new processes.
Press ENTER
job submitted
job submitted
all jobs submitted
Wait for jobs to complete.
Job executing in new process.
Job executing in new process.
All jobs done.
q
(terminal prompt $)
THE PROBLEM:
When the script is invoked from another program, the subprocesses are not executed. Here is the driver script, driver.py:
import time
import subprocess
from subprocess import PIPE
args = ['python3', 'main.py']
p = subprocess.Popen(args, bufsize=0, stdin=PIPE, universal_newlines=True)
time.sleep(1)
print('', file=p.stdin, flush=True)
time.sleep(1)
print('q', file=p.stdin, flush=True)
time.sleep(20)
Notice how "Job executing in new process." is not present in the output from the example run that follows:
(terminal prompt $) python3 driver.py
Press ENTER
Submit jobs as new processes.
Press ENTER
job submitted
job submitted
all jobs submitted
Wait for jobs to complete.
(terminal prompt $)
It seems like the cmd = input('Press ENTER\n') statement in main.py is blocking and preventing the subprocesses from executing. Strangely, commenting out the second time.sleep(1) statement in driver.py causes the main.py subprocesses to spawn as expected. Another way to make this "work" is to add time.sleep(1) inside the loop of main.py, right after thread.start().
This time-sensitive code is brittle. Is there a robust way to do this?

The problem lies in how you try to communicate with the second script using stdin=PIPE - try the following instead for the second script:
import time
import subprocess
from subprocess import PIPE
args = ['python', 'junk.py']
p = subprocess.Popen(args, bufsize=0, stdin=PIPE, universal_newlines=True)
p.communicate(input='\nq\n')
time.sleep(20)
Output:
Press ENTER
Submit jobs as new processes.
Press ENTER
job submitted
job submitted
all jobs submitted
Wait for jobs to complete.
Job executing in new process.
Job executing in new process.
All jobs done.
Process finished with exit code 0
Note that, instead of inserting timeouts everywhere, you should probably look in to joining completed processes, but that goes beyond the question.

I tried ShadowRanger's suggestion to add a call to multiprocessing.set_start_method():
if __name__ == '__main__':
multiprocessing.set_start_method('spawn')
main()
This solved the problem for me. I will read the documentation to learn more about this.

Related

ProcessPoolExecutor not limiting to set value

I have a number of computation processes that need to be ran. They take anywhere from 20 minutes to 1+ days. I want the user to be able to observe what each is doing through the standard output, therefore I am executing each in its own cmd window. When I set the number of workers, it does not observe that value and keeps on spinning up more and more until i cancel the program.
def run_job(args):
os.system("start cmd /k \"{} > \"{}\\stdout.txt\"\"".format(run_command,
outpath))
CONCURRENCY_HANDLER = concurrent.futures.ProcessPoolExecutor(max_workers = 3)
jobs =[]
ALL_RUNS_MATRIX = [{k1:v1...kn:vn},....
{kA1,vA1...kAn,vAn}
]
with CONCURRENCY_HANDLER as executor:
for idx, configuration in enumerate(ALL_RUNS_MATRIX):
generate_run_specific_files(configuration,idx)
args = [doesnt,matter]
time.sleep(5)
print("running new")
jobs.append( executor.submit(run_job,args))
time.sleep(10)
I Originally tried using the ThreadPoolExector to the same effect. Why is this not actually limiting the number happening concurrently, and if this wont work what should I use instead? I need to retain this "generate -> wait->run" path because of the nature of the program (I change a file that it reads for config, It starts, retains all necessary info in memory, then executes) so I am wary of the "workers pull their work off a queue as they come available" model
Not quite sure what you're trying to do. Maybe give us an example with a simple task that has the same issue with processes? Are you thinking of max_workers as an upper bound to the number of processes spawned? That might not be true. I think max_workers is the number of processor cores your process pool is allowed to use. According to the docs,
If max_workers is None or not given, it will default to the number of processors on the machine. If max_workers is less than or equal to 0, then a ValueError will be raised. On Windows, max_workers must be less than or equal to 61. If it is not then ValueError will be raised. If max_workers is None, then the default chosen will be at most 61, even if more processors are available.
Here is a simple example,
from concurrent.futures import ProcessPoolExecutor
from time import sleep
futures = []
def job(i):
print('Job started: ' + str(i))
return i
def all_done():
done = True
for ft in futures:
done = done and ft.done()
return done
with ProcessPoolExecutor(max_workers=8) as executor:
for i in range(3):
futures.append(executor.submit(job, i))
while not all_done():
sleep(0.1)
for ft in futures:
print('Job done: ' + str(ft.result()))
It prints,
Job started: 0
Job started: 1
Job started: 2
Job done: 0
Job done: 1
Job done: 2
Does this help?
As I mentioned in my comment as soon as the start command is satisfied by opening up the new command window, the system command returns as completed even though the run command being passed to cmd /K has only just started to run. Therefore the process in the pool is now free to run another task.
If I understand correctly your problem, you have the following goals:
Detect the true completion of your command so that you ensure that no more than 3 commands are running concurrently.
Collect the output of the command in a window that will remain open even after the command has completed. I infer this from your having used the /K switch when invoking cmd.
My solution would be to use windows created by tkinter to hold your output and to use subprocess.Popen to run your commands using argument shell=True. You can specify the additional argument stdout=PIPE to read the output from a command and funnel it the tkinter window. How to actually do that is the challenge.
I have not done tkinter programming before and perhaps someone with more experience could find a more direct method. It seems to me that the windows need to be created and written to in the main thread. To that end for every command that will be executed a window (a special subclass of Tk called CmdWindow) will be created and paired with the window command. The command and the output window number will be passed to a worker function run_command along with an instance of queue.Queue. run_command will then use subprocess.Popen to execute the command and for every line of output it reads from the output pipe, it will write a tuple to the queue with the values of the window number and the line to be written. The main thread is in a loop reading these tuples and writing the lines to the appropriate window. Because the main thread is occupied with writing command output, a special thread is used to create a thread pool and to submit all the commands that need to be run and to await for their completion. When all tasks are completed, a special "end" record is added to the queue signifying to the main thread that it can stop reading from the queue. A that point the main thread displays a 'Pausing for termination...' message and will not terminate until the user enters a carriage return at the console.
from concurrent.futures import ThreadPoolExecutor, as_completed
from subprocess import Popen, PIPE
from tkinter import *
from tkinter.scrolledtext import ScrolledText
from queue import Queue
from threading import Thread
class CmdWindow(Tk):
""" A console window """
def __init__(self, cmd):
super().__init__()
self.title(cmd)
self.configure(background="#BAD0EF")
title = Entry(self, relief=FLAT, bg="#BAD0EF", bd=0)
title.pack(side=TOP)
textArea = ScrolledText(self, height=24, width=120, bg="#FFFFFF", font=('consolas', '14'))
textArea.pack(expand=True, fill='both')
textArea.bind("<Key>", lambda e: "break") # read only
self._textArea = textArea
def write(self, s):
""" write the next line of output """
self._textArea.insert(END, s)
self.update()
def run_command(q, cmd, win):
""" run command cmd with output window win """
# special "create window" command:
q.put((win, None)) # create the window
with Popen(cmd, stdout=PIPE, shell=True, text=True) as proc:
for line in iter(proc.stdout.readline, ''):
# write line command:
q.put((win, line))
def run_tasks(q, arguments):
# we only need a thread pool since each command will be its own process:
with ThreadPoolExecutor(max_workers=3) as executor:
futures = []
for win, cmd in arguments:
futures.append(executor.submit(run_command, q, cmd, win))
# each task doesn't currently return anything
results = [future.result() for future in as_completed(futures)]
q.put(None) # signify end
def main():
q = Queue()
# sample commands to execute (under Windows):
cmds = ['dir *.py', 'dir *.html', 'dir *.txt', 'dir *.js', 'dir *.csv']
# each command will get its own window for output:
windows = list(cmds)
# pair a command with a window number:
arguments = enumerate(cmds)
# create the thread for running the commands:
thread = Thread(target=run_tasks, args=(q, arguments))
# start the thread:
thread.start()
# wait for command output in main thread
# output must be written from main thread
while True:
t = q.get() # get next tuple or special "end" record
if t is None: # special end record?
break # yes!
# unpack tuple:
win, line = t
if line is None: # special create window command
# use cmd as title and replace with actual window:
windows[win] = CmdWindow(windows[win])
else:
windows[win].write(line)
thread.join() # wait for run_jobs thread to end
input('Pausing for termination...') # wait for user to be finished looking at windows
if __name__ == '__main__':
main()

execute multiple commands in Linux using python in the same time

I need to execute multiple commands in Linux using python in the same time.
I don't need to run it command by command.
I try to write this code but i can't understand how to execute multiple commands in the same time using python also i read about python multithreading but i don't know how to use it.
Code:
# -*- coding: utf-8 -*-
import os
commands = ['ping www.google.com', 'ping www.yahoo.com', 'ping www.hotmail.com']
count = 0
for com in commands:
print "Start execute commands.."
os.system(com)
count += 1
print "[OK] command "+str(count)+" runing successfully."
else:
print "Finish.."
Please how i can do that with python and execute multiple commands in the same time??
Looks like a typical producer-consumer problem
import threading
import os
commands = ['ping www.google.com', 'ping www.yahoo.com', 'ping www.hotmail.com']
def worker_func():
while commands: # Checks if the list is not-empty. Loop exits when list is becomes empty
com = commands.pop(0)
print "Start execute commands.."
os.system(com)
count += 1
print "[OK] command "+str(count)+" runing successfully."
workers = [threading.Thread(target=worker_func, args=tuple(), name='thread_'+str(i)) for i in range(5) ] # Create 5 workers (consumers)
[worker.start() for worker in workers] # Start working
[worker.join() for worker in workers] # Wait for all workers to finish
Here I have created the 5 worker threads. These threads will run function worker_func.
worker_func Will pick up one element from the list and preform the job. When list becomes empty the function returns (exits).
Note: Read about Global Interpreter Lock to understand where python multithreading should not be used.
In this case the GIL (Global Interpreter Lock) should not affect you because the worker_func call a subprocess and wait for it complete. While the thread is waiting GIL is released to other threads.
I am suggesting two solutions but there are many
Simple Solution:
Use & at the end of your commands to run them in background:
commands = ['ping www.google.com &', 'ping www.yahoo.com &', 'ping www.hotmail.com &']
for com in commands:
os.system(com) # now commands will run in background
threading + Queue solution with a control over maximum threads to spawn:
from Queue import Queue, Empty
import threading, os
def worker_func():
while not stopped.is_set():
try:
# use the get_nowait() method for retrieving a queued item to
# prevent the thread from blocking when the queue is empty
com = q.get_nowait()
except Empty:
continue
try:
os.system(com)
except Exception as e:
print "[-] Error running command %s" %(str(e))
finally:
q.task_done()
commands = ['ping www.google.com', 'ping www.yahoo.com', 'ping www.hotmail.com']
thread_count = 4 # maximum parallel threads
stopped = threading.Event()
q = Queue()
print "-- Processing %s tasks in thread queue with %s thread limit" %(str(len(commands)), str(thread_count))
for item in commands:
q.put(item)
for i in range(thread_count):
t = threading.Thread(target=worker_func)
# t.daemon = True #Enable to run threads as daemons
t.start()
q.join() # block until all tasks are done
stopped.set()
My solution doesn't starts extra threads.
I use subprocess.Popen to run a command, store Popen objects in a list in the first loop, and wait till the subprocesses finish in the second
from subprocess import Popen, PIPE
commands = ['ping www.google.com', 'ping www.yahoo.com', 'dir']
count = 0
processes = []
for com in commands:
print "Start execute commands.."
processes.append(Popen(com, shell=True))
count += 1
print "[OK] command "+str(count)+" running successfully."
else:
print "Finish.."
for i, process in enumerate(processes):
process.wait()
print "Command #{} finished".format(i)
import threading
import os
def ping_url(number):
os.system(number)
thread_list = []
commands = ['ping www.google.com', 'ping www.yahoo.com', 'ping www.hotmail.com']
for url in commands:
# Instantiates the thread
t = threading.Thread(target=print_number, args=(url,))
# Sticks the thread in a list so that it remains accessible
thread_list.append(t)
# Starts threads
for thread in thread_list:
thread.start()
# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
thread.join()
# Demonstrates that the main process waited for threads to complete
print "Done"

Popen does not return in Python 2.7

I'm developing a process scheduler in Python. The idea is to create several threads from the main function and start an external process in each of these threads. The external process should continue to run until either it's finished or the main thread decides to stop it (by sending a kill signal) because the process' CPU time limit is exceeded.
The problem is that sometimes the Popen call blocks and fails to return. This code reproduces the problem with ~50% probability on my system (Ubuntu 14.04.3 LTS):
import os, time, threading, sys
from subprocess import Popen
class Process:
def __init__(self, args):
self.args = args
def run(self):
print("Run subprocess: " + " ".join(self.args))
retcode = -1
try:
self.process = Popen(self.args)
print("started a process")
while self.process.poll() is None:
# in the real code, check for the end condition here and send kill signal if required
time.sleep(1.0)
retcode = self.process.returncode
except:
print("unexpected error:", sys.exc_info()[0])
print("process done, returned {}".format(retcode))
return retcode
def main():
processes = [Process(["/bin/cat"]) for _ in range(4)]
# start all processes
for p in processes:
t = threading.Thread(target=Process.run, args=(p,))
t.daemon = True
t.start()
print("all threads started")
# wait for Ctrl+C
while True:
time.sleep(1.0)
main()
The output indicates that only 3 Popen() calls have returned:
Run subprocess: /bin/cat
Run subprocess: /bin/cat
Run subprocess: /bin/cat
Run subprocess: /bin/cat
started a process
started a process
started a process
all threads started
However, running ps shows that all four processes have in fact been started!
The problem does not show up when using Python 3.4, but I want to keep Python 2.7 compatibility.
Edit: the problem also goes away if I add some delay before starting each subsequent thread.
Edit 2: I did a bit of investigation and the blocking is caused by line 1308 in subprocess.py module, which tries to do some reading from a pipe in the parent process:
data = _eintr_retry_call(os.read, errpipe_read, 1048576)
There are a handful of bugs in python 2.7's subprocess module that can result in deadlock when calling the Popen constructor from multiple threads. They are fixed in later versions of Python, 3.2+ IIRC.
You may find that using the subprocess32 backport of Python 3.2/3.3's subprocess module resolves your issue.
*I was unable to locate the link to the actual bug report, but encountered it recently when dealing with a similar issue.

How to collect output from a Python subprocess

I am trying to make a python process that reads some input, processes it and prints out the result. The processing is done by a subprocess (Stanford's NER), for ilustration I will use 'cat'. I don't know exactly how much output NER will give, so I use run a separate thread to collect it all and print it out. The following example illustrates.
import sys
import threading
import subprocess
# start my subprocess
cat = subprocess.Popen(
['cat'],
shell=False, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=None)
def subproc_cat():
""" Reads the subprocess output and prints out """
while True:
line = cat.stdout.readline()
if not line:
break
print("CAT PROC: %s" % line.decode('UTF-8'))
# a daemon that runs the above function
th = threading.Thread(target=subproc_cat)
th.setDaemon(True)
th.start()
# the main thread reads from stdin and feeds the subprocess
while True:
line = sys.stdin.readline()
print("MAIN PROC: %s" % line)
if not line:
break
cat.stdin.write(bytes(line.strip() + "\n", 'UTF-8'))
cat.stdin.flush()
This seems to work well when I enter text with the keyboard. However, if I try to pipe input into my script (cat file.txt | python3 my_script.py), a racing condition seems to occur. Sometimes I get proper output, sometimes not, sometimes it locks down. Any help would be appreciated!
I am runing Ubuntu 14.04, python 3.4.0. The solution should be platform-independant.
Add th.join() at the end otherwise you may kill the thread prematurely before it has processed all the output when the main thread exits: daemon threads do not survive the main thread (or remove th.setDaemon(True) instead of th.join()).

Python subprocess return code without waiting

My question is hopefully particular enough to not relate to any of the other ones that I've read. I'm wanting to use subprocess and multiprocessing to spawn a bunch of jobs serially and return the return code to me. The problem is that I don't want to wait() so I can spawn the jobs all at once, but I do want to know when it finishes so I can get the return code. I'm having this weird problem where if I poll() the process it won't run. It just hangs out in the activity monitor without running (I'm on a Mac). I thought I could use a watcher thread, but I'm hanging on the q_out.get() which is leading me to believe that maybe I'm filling up the buffer and deadlocking. I'm not sure how to get around this. This is basically what my code looks like. If anyone has any better ideas on how to do this I would be happy to completely change my approach.
def watchJob(p1,out_q):
while p1.poll() == None:
pass
print "Job is done"
out_q.put(p1.returncode)
def runJob(out_q):
LOGFILE = open('job_to_run.log','w')
p1 = Popen(['../../bin/jobexe','job_to_run'], stdout = LOGFILE)
t = threading.Thread(target=watchJob, args=(p1,out_q))
t.start()
out_q= Queue()
outlst=[]
for i in range(len(nprocs)):
proc = Process(target=runJob, args=(out_q,))
proc.start()
outlst.append(out_q.get()) # This hangs indefinitely
proc.join()
You don't need neither multiprocessing nor threading here. You could run multiple child processes in parallel and collect their statutes all in a single thread:
#!/usr/bin/env python3
from subprocess import Popen
def run(cmd, log_filename):
with open(log_filename, 'wb', 0) as logfile:
return Popen(cmd, stdout=logfile)
# start several subprocesses
processes = {run(['echo', c], 'subprocess.%s.log' % c) for c in 'abc'}
# now they all run in parallel
# report as soon as a child process exits
while processes:
for p in processes:
if p.poll() is not None:
processes.remove(p)
print('{} done, status {}'.format(p.args, p.returncode))
break
p.args stores cmd in Python 3.3+, keep track of cmd yourself on earlier Python versions.
See also:
Python threading multiple bash subprocesses?
Python subprocess in parallel
Python: execute cat subprocess in parallel
Using Python's Multiprocessing module to execute simultaneous and separate SEAWAT/MODFLOW model runs
To limit number of parallel jobs a ThreadPool could be used (as shown in the first link):
#!/usr/bin/env python3
from multiprocessing.dummy import Pool # use threads
from subprocess import Popen
def run_until_done(args):
cmd, log_filename = args
try:
with open(log_filename, 'wb', 0) as logfile:
p = Popen(cmd, stdout=logfile)
return cmd, p.wait(), None
except Exception as e:
return cmd, None, str(e)
commands = ((('echo', str(d)), 'subprocess.%03d.log' % d) for d in range(500))
pool = Pool(128) # 128 concurrent commands at a time
for cmd, status, error in pool.imap_unordered(run_until_done, commands):
if error is None:
fmt = '{cmd} done, status {status}'
else:
fmt = 'failed to run {cmd}, reason: {error}'
print(fmt.format_map(vars())) # or fmt.format(**vars()) on older versions
The thread pool in the example has 128 threads (no more, no less). It can't execute more than 128 jobs concurrently. As soon as any of the threads frees (done with a job), it takes another, etc. Total number of jobs that is executed concurrently is limited by the number of threads. New job doesn't wait for all 128 previous jobs to finish. It is started when any of the old jobs is done.
If you're going to run watchJob in a thread, there's no reason to busy-loop with p1.poll; just call p1.wait() to block until the process finishes. Using the busy loop requires the GIL to constantly be released/re-acquired, which slows down the main thread, and also pegs the CPU, which hurts performance even more.
Also, if you're not using the stdout of the child process, you shouldn't send it to PIPE, because that could cause a deadlock if the process writes enough data to the stdout buffer to fill it up (which may actually be what's happening in your case). There's also no need to use multiprocessing here; just call Popen in the main thread, and then have the watchJob thread wait on the process to finish.
import threading
from subprocess import Popen
from Queue import Queue
def watchJob(p1, out_q):
p1.wait()
out_q.put(p1.returncode)
out_q = Queue()
outlst=[]
p1 = Popen(['../../bin/jobexe','job_to_run'])
t = threading.Thread(target=watchJob, args=(p1,out_q))
t.start()
outlst.append(out_q.get())
t.join()
Edit:
Here's how to run multiple jobs concurrently this way:
out_q = Queue()
outlst = []
threads = []
num_jobs = 3
for _ in range(num_jobs):
p = Popen(['../../bin/jobexe','job_to_run'])
t = threading.Thread(target=watchJob, args=(p1, out_q))
t.start()
# Don't consume from the queue yet.
# All jobs are running, so now we can start
# consuming results from the queue.
for _ in range(num_jobs):
outlst.append(out_q.get())
t.join()

Categories

Resources