I'm trying to run multiple exe's (12 of them), because of computer resources I can spawn maximum 4 at a time before I get performance degradation.
I'm trying to find if there is a way to call 4 exe's at a time and as soon as one of them finishes, to call another exe to fill the resources that have freed up
My current code does this:
excs = [r"path\to\exe\exe.exe",r"path\to\exe\exe.exe",r"path\to\exe\exe.exe",r"path\to\exe\exe.exe"]
running = [subprocess.Popen(ex) for ex in excs]
[process.wait() for process in running]
It repeats this process three times so that it runs all 12. Unfortunately it means that it needs to wait for all of them to finish before moving on to the next set. Is there a more efficient way of doing this?
For the record, all of the exe's have different run times.
Python has ThreadPoolExecutor which makes this very convenient
import subprocess
from concurrent.futures import ThreadPoolExecutor
def create_pool(N,commands):
pool = ThreadPoolExecutor(max_workers=N)
for command in commands:
pool.submit(subprocess.call, command)
pool.shutdown(wait=False)
def main():
N_WORKERS=4
commands = [job1, job2, ...]
create_pool(N_WORKERS, commands)
Related
I'm running Python 2.7 on the GCE platform to do calculations. The GCE instances boot, install various packages, copy 80 Gb of data from a storage bucket and runs a "workermaster.py" script with nohangup. The workermaster runs on an infinite loop which checks a task-queue bucket for tasks. When the task bucket isn't empty it picks a random file (task) and passes work to a calculation module. If there is nothing to do the workermaster sleeps for a number of seconds and checks the task-list again. The workermaster runs continuously until the instance is terminated (or something breaks!).
Currently this works quite well, but my problem is that my code only runs instances with a single CPU. If I want to scale up calculations I have to create many identical single-CPU instances and this means there is a large cost overhead for creating many 80 Gb disks and transferring the data to them each time, even though the calculation is only "reading" one small portion of the data for any particular calculation. I want to make everything more efficient and cost effective by making my workermaster capable of using multiple CPUs, but after reading many tutorials and other questions on SO I'm completely confused.
I thought I could just turn the important part of my workermaster code into a function, and then create a pool of processes that "call" it using the multiprocessing module. Once the workermaster loop is running on each CPU, the processes do not need to interact with each other or depend on each other in any way, they just happen to be running on the same instance. The workermaster prints out information about where it is in the calculation and I'm also confused about how it will be possible to tell the "print" statements from each process apart, but I guess that's a few steps from where I am now! My problems/confusion are that:
1) My workermaster "def" doesn't return any value because it just starts an infinite loop, where as every web example seems to have something in the format myresult = pool.map(.....); and
2) My workermaster "def" doesn't need any arguments/inputs - it just runs, whereas the examples of multiprocessing that I have seen on SO and on the Python Docs seem to have iterables.
In case it is important, the simplified version of the workermaster code is:
# module imports are here
# filepath definitions go here
def workermaster():
while True:
tasklist = cloudstoragefunctions.getbucketfiles('<my-task-queue-bucket')
if tasklist:
tasknumber = random.randint(2, len(tasklist))
assignedtask = tasklist[tasknumber]
print 'Assigned task is now: ' + assignedtask
subprocess.call('gsutil -q cp gs://<my-task-queue-bucket>/' + assignedtask + ' "' + taskfilepath + assignedtask + '"', shell=True)
tasktype = assignedtask.split('#')[0]
if tasktype == 'Calculation':
currentcalcid = assignedtask.split('#')[1]
currentfilenumber = assignedtask.split('#')[2].replace('part', '')
currentstartfile = assignedtask.split('#
currentendfile = assignedtask.split('#')[4].replace('.csv', '')
calcmodule.docalc(currentcalcid, currentfilenumber, currentstartfile, currentendfile)
elif tasktype == 'Analysis':
#set up and run analysis module, etc.
print ' Operation completed!'
os.remove(taskfilepath + assignedtask)
else:
print 'There are no tasks to be processed. Going to sleep...'
time.sleep(30)
Im trying to "call" the function multiple times using the multiprocessing module. I think I need to use the "pool" method, so I've tried this:
import multiprocessing
if __name__ == "__main__":
p = multiprocessing.Pool()
pool_output = p.map(workermaster, [])
My understanding from the docs is that the __name__ line is there only as a workaround for doing multiprocessing in Windows (which I am doing for development, but GCE is on Linux). The p = multiprocessing.Pool() line is creating a pool of workers equal to the number of system CPUs as no argument is specified. It the number of CPUs was 1 then I would expect the code to behave as it does before I attempted to use multiprocessing. The last line is the one that I don't understand. I thought that it was telling each of the processors in the pool that the "target" (thing to run) is workermaster. From the docs there appears to be a compulsory argument which is an iterable, but I don't really understand what this is in my case, as workermaster doesn't take any arguments. I've tried passing it an empty list, empty string, empty brackets (tuple?) and it doesn't do anything.
Please would it be possible for someone help me out? There are lots of discussions about using multiprocessing and this thread Mulitprocess Pools with different functions and this one python code with mulitprocessing only spawns one process each time seem to be close to what I am doing but still have iterables as arguments. If there is anything critical that I have left out please advise and I will modify my post - thank you to anyone who can help!
Pool() is useful if you want to run the same function with different argumetns.
If you want to run function only once then use normal Process().
If you want to run the same function 2 times then you can manually create 2 Process().
If you want to use Pool() to run function 2 times then add list with 2 arguments (even if you don't need arguments) because it is information for Pool() to run it 2 times.
But if you run function 2 times with the same folder then it may run 2 times the same task. if you will run 5 times then it may run 5 times the same task. I don't know if it is needed.
As for Ctrl+C I found on Stackoverflow Catch Ctrl+C / SIGINT and exit multiprocesses gracefully in python but I don't know if it resolves your problem.
I have a loop with highly time-consuming process and instead of waiting for each process to complete to move to next iteration, is it possible to run the process and just move to next iteration without waiting for it to complete?
Example : Given a text, the script should try to find the matching links from the Web and files from the local disk. Both return simply a list of links or paths.
for proc in (web_search, file_search):
results = proc(text)
yield from results
What I have as a solution is, using a timer while doing the job. And if the time exceeds the waiting time, the process should be moved to a tray and asked to work from there. Now I will go to next iteration and repeat the same. After my loop is over, I will collect the results from the process moved to the tray.
For simple cases, where the objective is to let each process run simultaneously, we can use Thread of threading module.
So we can tackle the issue like this, we make each process as a Thread and ask it to put its results in a list or some other collection. The code is given below:
from threading import Thread
results = []
def add_to_collection(proc, args, collection):
'''proc is the function, args are the arguments to pass to it.
collection is our container (here it is the list results) for
collecting results.'''
result = proc(*args)
collection.append(result)
print("Completed":, proc)
# Now we do our time consuming tasks
for proc in (web_search, file_search):
t = Thread(target=add_to_collection, args=(proc, ()))
# We assume proc takes no arguments
t.start()
For complex tasks, as mentioned in comments, its better to go with multiprocessing.pool.Pool.
I have a multiprocessing programs in python, which spawns several sub-processes and manages them (restarting them if the children identify problems, etc). Each subprocess is unique and their setup depends on a configuration file. The general structure of the master program is:
def main():
messageQueue = multiprocessing.Queue()
errorQueue = multiprocessing.Queue()
childProcesses = {}
for required_children in configuration:
childProcesses[required_children] = MultiprocessChild(errorQueue, messageQueue, *args, **kwargs)
for child_process in ChildProcesses:
ChildProcesses[child_process].start()
while True:
if local_uptime > configuration_check_timer: # This is to check if configuration file for processes has changed. E.g. check every 5 minutes
reload_configuration()
killChildProcessIfConfigurationChanged()
relaunchChildProcessIfConfigurationChanged()
# We want to relaunch error processes immediately (so while statement)
# Errors are not always crashes. Sometimes other system parameters change that require relaunch with different, ChildProcess specific configurations.
while not errorQueue.empty():
_error_, _childprocess_ = errorQueue.get()
killChildProcess(_childprocess_)
relaunchChildProcess(_childprocess)
print(_error_)
# Messages are allowed to lag if a configuration_timer is going to trigger or errorQueue gets something (so if statement)
if not messageQueue.empty():
print(messageQueue.get())
Is there a way to prevent the contents of the infinite while True loop take up 100pct CPU. If I add a sleep event at the end of the loop (e.g. sleep for 10s), then errors will take 10s to correct, ans messages will take 10s to flush.
If on the other hand, there was a way to have a time.sleep() for the duration of the configuration_check_timer, while still running code if messageQueue or errorQueue get stuff inside them, that would be nice.
I have some classifiers which I want to evaluate on the one sample. This task can be ran in parallel since they are independent of each other. This means that I want to parallelize it.
I tried it with python and also as a bash script. The problem is that when I run it the program for the first time, it takes like 30s-40s to finish. When I run the program multiple times consecutively, it takes just 1s-3s to finish. Even If I fed classifiers with different input I got different result so it seems that there is no caching. When I run some other program and afterwards rerun the program then it again takes 40s to finish.
I also observed in htop that CPUs are not that much utilized when the program is run for the first time but then when I rerun it again and again the CPUs are fully utilized.
Can someone please explain me this strange behaviour? How can I avoid it so that even the first run of the program will be fast?
Here is the python code:
import time
import os
from fastText import load_model
from joblib import delayed, Parallel, cpu_count
import json
os.system("taskset -p 0xff %d" % os.getpid())
def format_duration(start_time, end_time):
m, s = divmod(end_time - start_time, 60)
h, m = divmod(m, 60)
return "%d:%02d:%02d" % (h, m, s)
def classify(x, classifier_name, path):
f = load_model(path + os.path.sep + classifier_name)
labels, probabilities = f.predict(x, 2)
if labels[0] == '__label__True':
return classifier_name
else:
return None
if __name__ == '__main__':
with open('classifier_names.json') as json_data:
classifiers = json.load(json_data)
x = "input_text"
Parallel(n_jobs=cpu_count(), verbose=100, backend='multiprocessing', pre_dispatch='all') \
(delayed(perform_binary_classification)
(x, classifier, 'clfs/') for
classifier in classifiers)
end_time = time.time()
print(format_duration(start_time, end_time))
Here is the bash code:
#!/usr/bin/env bash
N=4
START_TIME=$SECONDS
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
run_with_lock(){
local x
read -u 3 -n 3 x && ((0==x)) || exit $x
(
"$#"
printf '%.3d' $? >&3
)&
}
open_sem $N
for d in classifiers/* ; do
run_with_lock ~/fastText/fasttext predict "$d" test.txt
done
ELAPSED_TIME=$(($SECONDS - $START_TIME))
echo time taken $ELAPSED_TIME seconds
EDITED
The bigger picture is that I am running flask app with 2 API methods. Each of them calls the function that parallelize the classification. When I am doing requests, it behaves the same way like this program below. First request to method A takes a lot and then subsequent requests take like 1s. When I switch to method B it is the same behavior as with method A. If I switch between method A and method B several times like A,B,A,B then each request takes like 40s to finish.
One approach is to modify your python code to use an event loop, stay running all the time, and execute new jobs in parallel whenever new jobs are detected. One way to do this is is to have a job directory, and place a file in that directory whenever there is a new job todo. The python script should also move completed jobs out of that directory to prevent running them more than once. How to run an function when anything changes in a dir with Python Watchdog?
Another option is to use a fifo file which is piped to the python script, and add new lines to that file for new jobs. https://www.linuxjournal.com/content/using-named-pipes-fifos-bash
I personally dislike parallelizing in python, and prefer to parallelize in bash using GNU parallel. To do it this way, I would
implement the event loop and jobs directory or the fifo file job queue using bash and GNU parallel
modify the python script to remove all the parallel code
read each jobspec from stdin
process each one serially in a loop
pipe jobs to parallel, which pipes them to ncpu python processes, which each runs forever waiting for the next job from stdin
e.g., something like:
run_jobs.sh:
mkfifo jobs
cat jobs | parallel --pipe --round-robin -n1 ~/fastText/fasttext
queue_jobs.sh:
echo jobspec >> jobs
.py:
for jobspec in sys.stdin:
...
This has the disadvantage that all ncpu python processes may have the slow startup problem, but they can stay running indefinitely, so the problem becomes insignificant, and the code is much simpler and easier to debug and maintain.
Using a jobs directory and a file for each jobspec instead of a fifo jobs queue requires slightly more code, but it also makes it more straightforward to see which jobs are queued and which jobs are done.
I am writing a Python tool that will automatically do a number of operations for me. One part of the automation is the parallel processing of n number of LSDYNA finite element simulations. I want to send the n simulations to a pool and have them distributed to a user specified number of processors, x. As one simulation terminates, another should be sent from the pool to the idle processor until the pool is empty. At this time, the rest of the python code should continue to execute. If this code were working correctly on my Windows machine, I would expect to see x number of cmd windows running LSDYNA at any time (until to pool is empty).
I have read a number of similar questions and they all seem to end up using the multiprocessing module. I've written a code that I thought was correct however, when I execute it, nothing happens. There are no error messages in the terminal window, and I do not get any LSDYNA output.
I do have a windows batch script that does work and does the same thing, in case that would be helpful to anyone.
In case I'm doing something completely wrong, a note about LSDYNA: When run via command line, each simulation runs in its own terminal window. The output files are written to the current directory at the time of command execution. The command format is:
"C:\LSDYNA\program\ls971_s_R5.1.1_winx64_p.exe" i=input.k ncpu=1 memory=100m
This is the Python code that I've come up with:
import os
import multiprocessing as mp
import subprocess
import glob
def run_LSDYNA(individual_sim_dir, model_name, solver_path):
os.chdir(individual_sim_dir)
os.environ['lstc_license'] = 'network'
os.environ['lstc_license_server'] = 'xxx.xx.xx.xx'
subprocess.call([solver_path, "i=%s" % model_name, "ncpu=1", "memory=100m"])
def parallel(sim_dir, procs, model_name, solver_path):
run_dirs = []
for individual_sim_dir in glob.glob(os.path.join(sim_dir, 'channel_*')):
run_dirs.append(individual_sim_dir)
pool = mp.Pool(processes=procs)
args = ((run_dir, model_name, solver_path) for run_dir in run_dirs)
simulations = pool.map_async(run_LSDYNA, args)
simulations.wait()
if __name__ == "__main__":
mp.freeze_support()
parallel('C:\Users\me\Desktop\script_test\Lower_Leg_Sims', 2, 'boot.k', "C:\LSDYNA\program\ls971_s_R5.1.1_winx64_p.exe")