Python mutiprocessing using Slurm: <Finalize object, dead> stalls - python

I have reserved some nodes on a SLURM cluster and want to run a python script on these nodes.
On one node (server) a python script should fill a queue with jobs and dispatch these jobs to the clients.
Most of the time this works fine, but occasionally the script stalls.
When using Ctrl+C it turns out that in that case one (or sometimes more) nodes seem to be stuck in <Finalize object, dead>:
^Csrun: interrupt (one more within 1 sec to abort)
srun: task 30: running
srun: tasks 0-29,31-39: exited
^Csrun: sending Ctrl-C to job 1075185.14
Exception KeyboardInterrupt: KeyboardInterrupt() in <Finalize object, dead> ignored
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
slurmd[cluster-112]: *** STEP 1075185.14 KILLED AT 2014-04-03T09:11:23 WITH SIGNAL 9 ***
I have no clue what the reason could be. Maybe, it looks like something related to the garbage collector.
This is the script I run:
#!/usr/bin/env
import os
import multiprocessing.managers
import Queue
import sys
import subprocess
import socket
import errno
class QueueManager(multiprocessing.managers.SyncManager):
pass
def worker(i, my_slurm_proc_id):
print 'hello %i (proc=%i)' % (i, my_slurm_proc_id)
time.sleep(0.1)
pass
def run_server(first_slurm_node, N_procs):
queue = Queue.Queue()
barrier = multiprocessing.BoundedSemaphore(N_procs-1)
QueueManager.register('get_queue', callable=lambda: queue)
QueueManager.register('get_barrier', callable=lambda: barrier)
for i in range(5000):
queue.put(i)
m = QueueManager(address=(first_slurm_node, 50000), authkey='abracadabra')
m.start()
for i in range(N_procs-1):
barrier.acquire(True)
m.get_queue().join() # somehow just 'queue.join()' doesn't work here
def run_client(my_slurm_proc_id, first_slurm_node):
QueueManager.register('get_queue')
QueueManager.register('get_barrier')
m = QueueManager(address=(first_slurm_node, 50000), authkey='abracadabra')
m.connect()
barrier = m.get_barrier()
barrier.acquire(True)
queue = m.get_queue()
while not queue.empty():
try:
data = queue.get_nowait()
except Queue.Empty:
break
worker(data, my_slurm_proc_id)
queue.task_done()
queue = None
barrier.release()
barrier = None
def main():
slurm_job_nodelist = subprocess.check_output('scontrol show hostname'.split(' ') + [os.environ['SLURM_JOB_NODELIST']]).split('\n')
master_node = slurm_job_nodelist[0]
my_slurm_proc_id = int(os.environ['SLURM_PROCID'])
N_procs = int(os.environ['SLURM_NPROCS'])
if my_slurm_proc_id == 0:
run_server(master_node, N_procs)
else:
run_client(my_slurm_proc_id, master_node)
if __name__ == '__main__':
main()

Related

limit number of threads used by a child process launched with ``multiprocessing.Process`

I'm trying to launch a function (my_function) and stop its execution after a certain time is reached.
So i challenged multiprocessing library and everything works well. Here is the code, where my_function() has been changed to only create a dummy message.
from multiprocessing import Queue, Process
from multiprocessing.queues import Empty
import time
timeout=1
# timeout=3
def my_function(something):
time.sleep(2)
return f'my message: {something}'
def wrapper(something, queue):
message ="too late..."
try:
message = my_function(something)
return message
finally:
queue.put(message)
try:
queue = Queue()
params = ("hello", queue)
child_process = Process(target=wrapper, args=params)
child_process.start()
output = queue.get(timeout=timeout)
print(f"ok: {output}")
except Empty:
timeout_message = f"Timeout {timeout}s reached"
print(timeout_message)
finally:
if 'child_process' in locals():
child_process.kill()
You can test and verify that depending on timeout=1 or timeout=3, i can trigger an error or not.
My main problem is that the real my_function() is a torch model inference for which i would like to limit the number of threads (to 4 let's say)
One can easily do so if my_function were in the main process, but in my example i tried a lot of tricks to limit it in the child process without any success (using threadpoolctl.threadpool_limits(4), torch.set_num_threads(4), os.environ["OMP_NUM_THREADS"]=4, os.environ["MKL_NUM_THREADS"]=4).
I'm completely open to other solution that can monitor the time execution of a function while limiting the number of threads used by this function.
thanks
Regards
You can limit simultaneous process with Pool. (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
You can set max tasks done per child. Check it out.
Here you have a sample from superfastpython by Jason Brownlee:
# SuperFastPython.com
# example of limiting the number of tasks per child in the process pool
from time import sleep
from multiprocessing.pool import Pool
from multiprocessing import current_process
# task executed in a worker process
def task(value):
# get the current process
process = current_process()
# report a message
print(f'Worker is {process.name} with {value}', flush=True)
# block for a moment
sleep(1)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool(2, maxtasksperchild=3) as pool:
# issue tasks to the process pool
for i in range(10):
pool.apply_async(task, args=(i,))
# close the process pool
pool.close()
# wait for all tasks to complete
pool.join()

Python threads and linux ioctl wait

I have the following toy example for Python threading module
from __future__ import print_function
import threading
import time
import signal
import sys
import os
import time
class ThreadShutdown(Exception):
# Custom exception to allow clean thread exit
pass
def thread_shutdown(signum, frame):
print(" o Signal {} caught and raising ThreadShutdown exception".format(signum))
raise ThreadShutdown
def main():
"""
Register the signal handlers needed to stop
cleanly the child accessing thread
"""
signal.signal(signal.SIGTERM, thread_shutdown)
signal.signal(signal.SIGINT, thread_shutdown)
test_run_seconds = 120
try:
thread = ChildThread()
thread.start()
time.sleep(1)
while test_run_seconds > 0:
test_run_seconds -= 1
print(" o [{}] remaining time is {} seconds".format(time.asctime( time.localtime(time.time()) ), test_run_seconds))
time.sleep(1)
except ThreadShutdown:
thread.shutdown_flag.set()
thread.join()
print(" o ThreadShutdown procedure complete")
return
proc.terminate()
thread.shutdown_flag.set()
thread.join()
print(" o Test terminated")
class ChildThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.shutdown_flag = threading.Event()
def run(self):
while not self.shutdown_flag.is_set():
print(" o [{}] is the current time in child, sleep for 10s".format(time.asctime( time.localtime(time.time()))))
time.sleep(10)
return
if __name__ == "__main__":
sys.exit(main())
which behaves as expected (the main thread counts every second while the spawned thread prints only every 10 seconds.
I was trying to understand the behaviour of the same code snippet in presence of blocking waits in kernel mode in the spawned thread. For example, assume that the spawned thread now goes into a killable wait in an ioctl with a timeout of 10s, I would still expect to have the main thread counting every second. For some reason, it instead counts every 10s, as if it was blocked as well in the wait of the spawned thread. What is the reason?

Multiprocessing callback message

I have long running process, that I want to keep track about in which state it currently is in. There is N processes running in same time therefore multiprocessing issue.
I pass Queue into process to report messages about state, and this Queue is then read(if not empty) in thread every couple of second.
I'm using Spider on windows as environment and later described behavior is in its console. I did not try it in different env.
from multiprocessing import Process,Queue,Lock
import time
def test(process_msg: Queue):
try:
process_msg.put('Inside process message')
# process...
return # to have exitstate = 0
except Exception as e:
process_msg.put(e)
callback_msg = Queue()
if __name__ == '__main__':
p = Process(target = test,
args = (callback_msg,))
p.start()
time.sleep(5)
print(p)
while not callback_msg.empty():
msg = callback_msg.get()
if type(msg) != Exception:
tqdm.write(str(msg))
else:
raise msg
Problem is that whatever I do with code, it never reads what is inside the Queue(also because it never puts anything in it). Only when I switch to dummy version, which runs similary to threading on only 1 CPU from multiprocessing.dummy import Process,Queue,Lock
Apparently the test function have to be in separate file.

python multiprocessing pool timeout

I want to use multiprocessing.Pool, but multiprocessing.Pool can't abort a task after a timeout. I found solution and some modify it.
from multiprocessing import util, Pool, TimeoutError
from multiprocessing.dummy import Pool as ThreadPool
import threading
import sys
from functools import partial
import time
def worker(y):
print("worker sleep {} sec, thread: {}".format(y, threading.current_thread()))
start = time.time()
while True:
if time.time() - start >= y:
break
time.sleep(0.5)
# show work progress
print(y)
return y
def collect_my_result(result):
print("Got result {}".format(result))
def abortable_worker(func, *args, **kwargs):
timeout = kwargs.get('timeout', None)
p = ThreadPool(1)
res = p.apply_async(func, args=args)
try:
# Wait timeout seconds for func to complete.
out = res.get(timeout)
except TimeoutError:
print("Aborting due to timeout {}".format(args[1]))
# kill worker itself when get TimeoutError
sys.exit(1)
else:
return out
def empty_func():
pass
if __name__ == "__main__":
TIMEOUT = 4
util.log_to_stderr(util.DEBUG)
pool = Pool(processes=4)
# k - time to job sleep
featureClass = [(k,) for k in range(20, 0, -1)] # list of arguments
for f in featureClass:
# check available worker
pool.apply(empty_func)
# run job with timeout
abortable_func = partial(abortable_worker, worker, timeout=TIMEOUT)
pool.apply_async(abortable_func, args=f, callback=collect_my_result)
time.sleep(TIMEOUT)
pool.terminate()
print("exit")
main modification - worker process exit with sys.exit(1). It's kill worker process and kill job thread, but i'm not sure that this solution is good. What potential problems can i get, when process terminate itself with running job?
There is no implicit risk in stopping a running job, the OS will take care of correctly terminating the process.
If your job is writing on files, you might end up with lots of truncated files on your disk.
Some small issue might also occur if you write on DBs or if you are connected with some remote process.
Nevertheless, Python standard Pool does not support worker termination on task timeout. Terminating processes abruptly might lead to weird behaviour within your application.
Pebble processing Pool does support timing-out tasks.
from pebble import ProcessPool
from concurrent.futures import TimeoutError
TIMEOUT_SECONDS = 5
def function(one, two):
return one + two
with ProcessPool() as pool:
future = pool.schedule(function, args=(1, 2), timeout=TIMEOUT_SECONDS)
try:
result = future.result()
except TimeoutError:
print("Future: %s took more than 5 seconds to complete" % future)

Using Python's Multiprocessing module to execute simultaneous and separate SEAWAT/MODFLOW model runs

I'm trying to complete 100 model runs on my 8-processor 64-bit Windows 7 machine. I'd like to run 7 instances of the model concurrently to decrease my total run time (approx. 9.5 min per model run). I've looked at several threads pertaining to the Multiprocessing module of Python, but am still missing something.
Using the multiprocessing module
How to spawn parallel child processes on a multi-processor system?
Python Multiprocessing queue
My Process:
I have 100 different parameter sets I'd like to run through SEAWAT/MODFLOW to compare the results. I have pre-built the model input files for each model run and stored them in their own directories. What I'd like to be able to do is have 7 models running at a time until all realizations have been completed. There needn't be communication between processes or display of results. So far I have only been able to spawn the models sequentially:
import os,subprocess
import multiprocessing as mp
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
files = []
for f in os.listdir(ws + r'\fieldgen\reals'):
if f.endswith('.npy'):
files.append(f)
## def work(cmd):
## return subprocess.call(cmd, shell=False)
def run(f,def_param=ws):
real = f.split('_')[2].split('.')[0]
print 'Realization %s' % real
mf2k = r'c:\modflow\mf2k.1_19\bin\mf2k.exe '
mf2k5 = r'c:\modflow\MF2005_1_8\bin\mf2005.exe '
seawatV4 = r'c:\modflow\swt_v4_00_04\exe\swt_v4.exe '
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
exe = seawatV4x64
swt_nam = ws + r'\reals\real%s\ss\ss.nam_swt' % real
os.system( exe + swt_nam )
if __name__ == '__main__':
p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
tasks = range(len(files))
results = []
for f in files:
r = p.map_async(run(f), tasks, callback=results.append)
I changed the if __name__ == 'main': to the following in hopes it would fix the lack of parallelism I feel is being imparted on the above script by the for loop. However, the model fails to even run (no Python error):
if __name__ == '__main__':
p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
p.map_async(run,((files[f],) for f in range(len(files))))
Any and all help is greatly appreciated!
EDIT 3/26/2012 13:31 EST
Using the "Manual Pool" method in #J.F. Sebastian's answer below I get parallel execution of my external .exe. Model realizations are called up in batches of 8 at a time, but it doesn't wait for those 8 runs to complete before calling up the next batch and so on:
from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread
def run(f,ws):
real = f.split('_')[-1].split('.')[0]
print('Realization %s' % real)
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
swt_nam = ws + r'\reals\real%s\ss\ss.nam_swt' % real
subprocess.check_call([seawatV4x64, swt_nam])
def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\reals')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))
# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(8)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion
if __name__ == '__main__':
mp.freeze_support() # optional if the program is not frozen
main()
No error traceback is available. The run() function performs its duty when called upon a single model realization file as with mutiple files. The only difference is that with multiple files, it is called len(files) times though each of the instances immediately closes and only one model run is allowed to finish at which time the script exits gracefully (exit code 0).
Adding some print statements to main() reveals some information about active thread-counts as well as thread status (note that this is a test on only 8 of the realization files to make the screenshot more manageable, theoretically all 8 files should be run concurrently, however the behavior continues where they are spawn and immediately die except one):
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\test')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))
# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count())]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
print('Active Count a',threading.activeCount())
for _ in threads:
print(_)
q.put_nowait(None) # signal no more files
for t in threads:
print(t)
t.join() # wait for completion
print('Active Count b',threading.activeCount())
**The line which reads "D:\\Data\\Users..." is the error information thrown when I manually stop the model from running to completion. Once I stop the model running, the remaining thread status lines get reported and the script exits.
EDIT 3/26/2012 16:24 EST
SEAWAT does allow concurrent execution as I've done this in the past, spawning instances manually using iPython and launching from each model file folder. This time around, I'm launching all model runs from a single location, namely the directory where my script resides. It looks like the culprit may be in the way SEAWAT is saving some of the output. When SEAWAT is run, it immediately creates files pertaining to the model run. One of these files is not being saved to the directory in which the model realization is located, but in the top directory where the script is located. This is preventing any subsequent threads from saving the same file name in the same location (which they all want to do since these filenames are generic and non-specific to each realization). The SEAWAT windows were not staying open long enough for me to read or even see that there was an error message, I only realized this when I went back and tried to run the code using iPython which directly displays the printout from SEAWAT instead of opening a new window to run the program.
I am accepting #J.F. Sebastian's answer as it is likely that once I resolve this model-executable issue, the threading code he has provided will get me where I need to be.
FINAL CODE
Added cwd argument in subprocess.check_call to start each instance of SEAWAT in its own directory. Very key.
from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread
import threading
def run(f,ws):
real = f.split('_')[-1].split('.')[0]
print('Realization %s' % real)
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
cwd = ws + r'\reals\real%s\ss' % real
swt_nam = ws + r'\reals\real%s\ss\ss.nam_swt' % real
subprocess.check_call([seawatV4x64, swt_nam],cwd=cwd)
def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\reals')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))
# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count()-1)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion
if __name__ == '__main__':
mp.freeze_support() # optional if the program is not frozen
main()
I don't see any computations in the Python code. If you just need to execute several external programs in parallel it is sufficient to use subprocess to run the programs and threading module to maintain constant number of processes running, but the simplest code is using multiprocessing.Pool:
#!/usr/bin/env python
import os
import multiprocessing as mp
def run(filename_def_param):
filename, def_param = filename_def_param # unpack arguments
... # call external program on `filename`
def safe_run(*args, **kwargs):
"""Call run(), catch exceptions."""
try: run(*args, **kwargs)
except Exception as e:
print("error: %s run(*%r, **%r)" % (e, args, kwargs))
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
workdir = os.path.join(ws, r'fieldgen\reals')
files = ((os.path.join(workdir, f), ws)
for f in os.listdir(workdir) if f.endswith('.npy'))
# start processes
pool = mp.Pool() # use all available CPUs
pool.map(safe_run, files)
if __name__=="__main__":
mp.freeze_support() # optional if the program is not frozen
main()
If there are many files then pool.map() could be replaced by for _ in pool.imap_unordered(safe_run, files): pass.
There is also mutiprocessing.dummy.Pool that provides the same interface as multiprocessing.Pool but uses threads instead of processes that might be more appropriate in this case.
You don't need to keep some CPUs free. Just use a command that starts your executables with a low priority (on Linux it is a nice program).
ThreadPoolExecutor example
concurrent.futures.ThreadPoolExecutor would be both simple and sufficient but it requires 3rd-party dependency on Python 2.x (it is in the stdlib since Python 3.2).
#!/usr/bin/env python
import os
import concurrent.futures
def run(filename, def_param):
... # call external program on `filename`
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\reals')
files = (os.path.join(wdir, f) for f in os.listdir(wdir) if f.endswith('.npy'))
# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
future_to_file = dict((executor.submit(run, f, ws), f) for f in files)
for future in concurrent.futures.as_completed(future_to_file):
f = future_to_file[future]
if future.exception() is not None:
print('%r generated an exception: %s' % (f, future.exception()))
# run() doesn't return anything so `future.result()` is always `None`
Or if we ignore exceptions raised by run():
from itertools import repeat
... # the same
# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
executor.map(run, files, repeat(ws))
# run() doesn't return anything so `map()` results can be ignored
subprocess + threading (manual pool) solution
#!/usr/bin/env python
from __future__ import print_function
import os
import subprocess
import sys
from Queue import Queue
from threading import Thread
def run(filename, def_param):
... # define exe, swt_nam
subprocess.check_call([exe, swt_nam]) # run external program
def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)
# start threads
q = Queue()
threads = [Thread(target=worker, args=(q,)) for _ in range(8)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\reals')
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))
for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion
Here is my way to maintain the minimum x number of threads in the memory. Its an combination of threading and multiprocessing modules. It may be unusual to other techniques like respected fellow members have explained above BUT may be worth considerable. For the sake of explanation, I am taking a scenario of crawling a minimum of 5 websites at a time.
so here it is:-
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading
# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata + "\n")
pass
Next is threadController function. This function will control the flow of threads to the main memory. It will keep activating the threads to maintain the threadNum "minimum" limit ie. 5. Also it won't exit until, all Active threads(acitveCount) are finished up.
It will maintain a minimum of threadNum(5) startProcess function threads (these threads will eventually start the Processes from the processList while joining them with a time out of 60 seconds). After staring threadController, there would be 2 threads which are not included in the above limit of 5 ie. the Main thread and the threadController thread itself. thats why threading.activeCount() != 2 has been used.
def threadController():
print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **
startProcess function, as a separate thread, would start Processes from the processlist. The purpose of this function (**started as a different thread) is that It would become a parent thread for Processes. So when It will join them with a timeout of 60 seconds, this would stop the startProcess thread to move ahead but this won't stop threadController to perform. So this way, threadController will work as required.
def startProcess():
pr = processList.pop(0)
pr.start()
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.
if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt", "r").read().split("\n")
output = open("test.txt", "a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time
# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.
# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.start()
mt.join() # won't let go next until threadController thread finishes.
output.close()
print "Done"
Besides maintaining a minimum number of threads in the memory, my aim was to also have something which could avoid stuck threads or processes in the memory. I did this using the time out function.
My apologies for any typing mistake.
I hope this construction would help anyone in this world.
Regards,
Vikas Gautam

Categories

Resources