So, I have a code which takes in input and starts a spark job in cluster.. So, something like
spark-submit driver.py -i input_path
Now, I have list of paths and I want to execute all these simulatenously..
Here is what I tried
base_command = 'spark-submit driver.py -i %s'
for path in paths:
command = base_command%path
subprocess.Popen(command, shell=True)
My hope was, all of the shell commands would be executed simultaneously but instead, I am noticing that it executes one command at a time..
How do i execute all the bash commands simultaneously.
Thanks
This is where pool comes in, it is designed for just this case. It maps many inputs to many threads automatically. Here is a good resource on how to use it.
from multiprocessing import Pool
def run_command(path):
command = "spark-submit driver.py -i {}".format(path)
subprocess.Popen(command, shell=True)
pool = Pool()
pool.map(run_command, paths)
It will create a thread for every item in paths and, run them all at the same time for the given input
Related
I have searched and tried a lot of codes for that topic. I am trying to run two python files but to run both at the same time
This is my try
import subprocess
subprocess.run("py pop1.py & py pop2.py", shell=True)
But this executes the first python then the second one. This is not the target. My target is to run both files at the same time.
subprocess can do this all on its own without invoking shell=True with the & bashism.
import subprocess
# start processes running in parallel
p1 = subprocess.Popen(['py', 'pop1.py'])
p2 = subprocess.Popen(['py', 'pop2.py'])
# wait for both processes to complete
p1.wait()
p2.wait()
I have scripts I would like to execute in sequence with a time delay between the each of them.
The intention is to run the scripts which scan for an string in file names and imports those files into a folder. The time delay is to give the script the time to finish copying the files before moving to the next file.
I have tried the questions already posed on Stackoverflow:
Running multiple Python scripts
Run a python script from another python script, passing in args
But I'm not understanding why the lines below don't work.
import time
import subprocess
subprocess.call(r'C:\Users\User\Documents\get summary into folder.py', shell=True)
time.sleep(100)
subprocess.call(r'C:\Users\User\Documents\get summaries into folder.py', shell=True)
time.sleep(100)
The script opens the files but doesn't run.
Couple of things, first of all, time.sleep accepts seconds as an argument, so you're waiting 100s after you've spawned these 2 processes, I guess you meant .100. Anyway, if you just want to run synchronously your 2 scripts better use subprocess.Popen.wait, that way you won't have to wait more than necessary, example below:
import time
import subprocess
test_cmd = "".join([
"import time;",
"print('starting script{}...');",
"time.sleep(1);",
"print('script{} done.')"
])
for i in range(2):
subprocess.Popen(
["python", "-c", test_cmd.format(*[str(i)] * 2)], shell=True).wait()
print('-'*80)
I need to run another python script which generating data in my script I current working with. I use subprocess to run it:
cmd = 'python /home/usr/script.py arg1 arg2 arg3'
subprocess.Popen(cmd, shell=True)
But have a problem. Previous script generate few directories in 'current directory', it means in directory it was run in. And I can't modify previous script, cause it's not mine. How to set current directory to dir where I want to get data? \n
Another small problem is that when I run subprocess.Popen() my script doesn't end. Should I run it in another way?
the best way is to use subprocess.call instead (waits & terminates, Popen without the relevant wait() may create a zombie process) and use the cwd= parameter to specify current dir for the subprocess:
cmd = ['python','/home/usr/script.py','arg1','arg2','arg3']
return_code = subprocess.call(cmd, cwd="/some/dir")
(also pass the command as a list, and drop shell=True, you don't need it here)
I have python script that takes command line arguments. The way I get the command line arguments is by reading a mongo database. I need to iterate over the mongo query and launch a different process for the single script with different command line arguments from the mongo query.
Key is, I need the launched processes to be:
separate processes share nothing
when killing the process, I need to be able to kill them all easily.
I think the command killall -9 script.py would work and satisfies the second constraint.
Edit 1
From the answer below, the launcher.py program looks like this
def main():
symbolPreDict = initializeGetMongoAllSymbols()
keys = sorted(symbolPreDict.keys())
for symbol in keys:
# Display key.
print(symbol)
command = ['python', 'mc.py', '-s', str(symbol)]
print command
subprocess.call(command)
if __name__ == '__main__':
main()
The problem is that mc.py has a call that blocks
receiver = multicast.MulticastUDPReceiver ("192.168.0.2", symbolMCIPAddrStr, symbolMCPort )
while True:
try:
b = MD()
data = receiver.read() # This blocks
...
except Exception, e:
print str(e)
When I run the launcher, it just executes one of the mc.py (there are at least 39). How do I modify the launcher program to say "run the launched script in background" so that the script returns to the launcher to launch more scripts?
Edit 2
The problem is solved by replacing subprocess.call(command) with subprocess.Popen(command)
One thing I noticed though, if I say ps ax | grep mc.py, the PID seem to be all different. I don't think I care since I can kill them all pretty easily with killall.
[Correction] kill them with pkill -f xxx.py
There are several options for launching scripts from a script. The easiest are probably to use the subprocess or os modules.
I have done this several times to launch things to separate nodes on a cluster. Using os it might look something like this:
import os
for i in range(len(operations)):
os.system("python myScript.py {:} {:} > out.log".format(arg1,arg2))
using killall you should have no problem terminating processes spawned this way.
Another option is to use subprocess which has got a wide range of features and is much more flexible than os.system. An example might look like:
import subprocess
for i in range(len(operations)):
command = ['python','myScript.py','arg1','arg2']
subprocess.call(command)
In both of these methods, the processes are independent and share nothing other than a parent PID.
I am trying to migrate a bash script to Python.
The bash script runs multiple OS commands in parallel then waits for them to finish before resuming, ie:
command1 &
command2 &
.
commandn &
wait
command
I want to achieve the same using Python subprocess. Is this possible? How can I wait for a subprocess.call command to finish before resuming?
You can still use Popen which takes the same input parameters as subprocess.call but is more flexible.
subprocess.call: The full function signature is the same as that of the Popen constructor - this functions passes all supplied arguments directly through to that interface.
One difference is that subprocess.call blocks and waits for the subprocess to complete (it is built on top of Popen), whereas Popen doesn't block and consequently allows you to launch other processes in parallel.
Try the following:
from subprocess import Popen
commands = ['command1', 'command2']
procs = [ Popen(i) for i in commands ]
for p in procs:
p.wait()
Expanding on Aaron and Martin's answer, here is a solution that runs uses subprocess and Popen to run n processes in parallel:
import subprocess
commands = ['cmd1', 'cmd2', 'cmd3', 'cmd4', 'cmd5']
n = 2 #the number of parallel processes you want
for j in range(max(int(len(commands)/n), 1)):
procs = [subprocess.Popen(i, shell=True) for i in commands[j*n: min((j+1)*n, len(commands))] ]
for p in procs:
p.wait()
I find this to be useful when using a tool like multiprocessing could cause undesired behavior.