Python - Limiting the Number of Threads while passing arguments

Python - Limiting the Number of Threads while passing arguments - python

I am trying to run some threads using a thread limiter to keep number of threads to 10. I had an example to use as guide but I need to pass in some arguments to the function when the calling the thread. I am struggling with the passing of arguments. I marked with ### the areas where I am not sure the syntax and where I think my problem is.
I am trying to use this The right way to limit maximum number of threads running at once? as a guide. Here is the sample code I am trying to follow. My example is below that. Any time I try to pass in all the arguments I get back TypeError: __ init__() takes 1 to 6 arguments but 17 were passed In my example below I cut the arguments down to 4 to make it easier to read but I have 17 arguments in my live code. In example I keep the arguments down to run, main_path, target_path, jiranum to keep reading easier
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
class MyThread(threading.Thread):
def run(self):
threadLimiter.acquire()
try:
self.Executemycode()
finally:
threadLimiter.release()
def Executemycode(self):
print(" Hello World!")
# <your code here>
My code
import os
import sys
import threading
threadLimiter = threading.BoundedSemaphore(10)
class MyThread(threading.Thread):
def run(self): ### I also tried (run, main_path, target_path, jiranum)
threadLimiter.acquire()
try:
self.run_compare(run, main_path, target_path, jiranum) #### I also tried self
finally:
threadLimiter.release()
def run_compare(run, main_path, target_path, jiranum): #### ???
os.chdir(target_path)
os.system(main_path + ', ' + target_path + ',' + jiranum + ',' + run)
if __name__ == '__main__':
#set the needed variables
threads = []
for i in range (1, int(run)+1):
process=threading.Thread(target=MyThread, args=(str(i), main_path, target_path, jiranum)) #### Is this defined right?
process.start()
threads.append(process)
for process in threads:
process.join()

This would probably be a simpler task with concurrent.futures but I like getting my hands dirty, so here we go. A few suggestions:
I find classes as thread targets often complicate things, so if there's no compelling reason, keep it simple
It's easier to use a with block to acquire and release a semaphore, and a regular semaphore usually suffices in that case
17 arguments can get messy; I would build a tuple of the arguments outside the call to threading.Thread() so it's easier to read, then unpack the tuple in the thread
This should work as a simple example; os.system() just echoes something and sleeps, so you can see the thread count is limited by the semaphore.
import os
import threading
from random import randint
threadLimiter = threading.Semaphore(10)
def run_config(*args):
run, arg1, arg2 = args # unpack the 17 args by name
with threadLimiter:
seconds = randint(2, 7)
os.system(f"echo run {run}, args {arg1} {arg2} ; sleep {seconds}")
if __name__ == '__main__':
threads = []
run = "20" # I guess this is a string because of below?
for i in range (1, int(run)+1):
thr_args = (str(i), "arg1",
"arg2") # put the 17 args here
thr = threading.Thread(target=run_config, args=thr_args)
thr.start()
threads.append(thr)
for thr in threads:
thr.join()

Related

Automatically restarting Python sub-processes using identical arguments

I have a python script which calls a series of sub-processes. They need to run "for ever" - but they occasionally die, or get killed. When this happens I need to restart the process using the same arguments as the one which died.
This is a very simplified version:
[edit: this is the less simplified version, which includes "restart" code]
import multiprocessing
import time
import random
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
if __name__ == '__main__':
children = [] # list
args = {} # dictionary
for processNumber in range(10,15):
p = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
children.append(p)
p.start()
args[p.pid] = processNumber
while True:
time.sleep(1)
for n, p in enumerate(children):
if not p.is_alive():
#get parameters dead child was started with
pidArgs = args[p.pid]
del(args[p.pid])
print("n,args,p: ",n,pidArgs,p)
children.pop(n)
# start new process with same args
p = multiprocessing.Process(
target=printNumber,
args=(pidArgs,)
)
children.append(p)
p.start()
args[p.pid] = pidArgs
I have updated the example to illustrate how I want the processes to be restarted if one crashes/killed/etc - keeping track of which pid was started with which args.
Is this the "best" way to do this, or is there a more "python" way of doing this?

I think I would create a separate thread for each Process and use a ProcessPoolExecutor. Executors have a useful function, submit, which returns a Future. You can wait on each Future and re-launch the Executor when the Future is done. Arguments to the function are tracked as class variables, so restarting is just a simple loop.
import threading
from concurrent.futures import ProcessPoolExecutor
import time
import random
import traceback
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
class KeepRunning(threading.Thread):
def __init__(self, func, *args, **kwds):
self.func = func
self.args = args
self.kwds = kwds
super().__init__()
def run(self):
while True:
with ProcessPoolExecutor(max_workers=1) as pool:
future = pool.submit(self.func, *self.args, **self.kwds)
try:
future.result()
except Exception:
traceback.print_exc()
if __name__ == '__main__':
for process_number in range(10, 15):
keep = KeepRunning(printNumber, process_number)
keep.start()
while True:
time.sleep(1)
At the end of the program is a loop to keep the main thread running. Without that, the program will attempt to exit while your Processes are still running.

For the example you provided I would just remove the exit condition from the while loop and change it to True.
As you said though the actual code is more complicated (why didn't you post that?). So if the process gets terminated by lets say an exception just put the code inside a try catch block. You can then put said block in an infinite loop.
I hope this is what you are looking for but that seems to be the right way to do it provided the goal and information you provided.

Instead of just starting the process immediately, you can save the list of processes and their arguments, and create another process that checks they are alive.
For example:
if __name__ == '__main__':
process_list = []
for processNumber in range(5):
process = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
process_list.append((process,args))
process.start()
while True:
for running_process, process_args in process_list:
if not running_process.is_alive():
new_process = multiprocessing.Process(target=printNumber, args=(process_args))
process_list.remove(running_process, process_args) # Remove terminated process
process_list.append((new_process, process_args))
I must say that I'm not sure the best way to do it is in python, you may want to look at scheduler services like jenkins or something like that.

How can I check that the Process class from Python Multiprocessing has worked?

I've written the following code which runs a function that simulates a stochastic simulation of a series of chemical reactions. I've written the following code:
v = range(1, 51)
def parallelfunc(*v):
gillespie_tau_leaping(start_state, LHS, stoch_rate, state_change_array)
def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid())
print('process id:', os.getpid())
if __name__ == '__main__':
info('main line')
start = datetime.utcnow()
p = Process(target=parallelfunc, args=(v))
p.start()
p.join()
end = datetime.utcnow()
sim_time = end - start
print(f"Simualtion utc time:\n{sim_time}")
I'm using the Process method from the multiprocessing library and am trying to run gillespie_tau_leaping 50 times.
Only I'm not sure if its working. gillespie_tau_leaping prints out a number of values to the terminal, but these values are only printed out once, I'd expect them to be printed out 50 times.
I tried using the getpid etc command and this returns the following to the terminal:
main line
module name: __main__
parent process: 6188
process id: 27920
How can I tell if my code as worked and how can I get it to print the values from gillepsie_tau_leaping 50 times to the terminal?
Cheers

Your code is running just one process, the call to Process, spawns a new thread but you are doing it only once (not in a loop).
I would suggest you to use multiprocessing pools
Your code can be something like this:
from multiprocess import Pool
def parallelfunc(*args):
do_something()
def main():
# create a list of list of args for the function invocation
func_args = [['arg1call1', 'arg2call1', 'arg3call1'], ['arg1call2', 'arg2call2', 'arg3call2']]
with Pool() as p:
results = p.map(parallelfunc, func_args)
# do something with results which is a list of results
multiprocessing pool by default create the same number of processes as your CPU cores and manage the process Pool till the end of the processing taking care of all the Inter Process Communication.
This is really handy because synchronizing processes can be hard.
Hope this helps

Python: How to make program wait till function's or method's completion

Often there is a need for the program to wait for a function to complete its work. Sometimes it is opposite: there is no need for a main program to wait.
I've put a simple example. There are four buttons. Clicking each will call the same calculate() function. The only difference is the way the function is called.
"Call Directly" button calls calculate() function directly. Since there is a 'Function End' print out it is evident that the program is waiting for the calculate function to complete its job.
"Call via Threading" calls the same function this time using threading mechanism. Since the program prints out ': Function End' message immidiately after the button is presses I can conclude the program doesn't wait for calculate() function to complete. How to override this behavior? How to make program wait till calculate() function is finished?
"Call via Multiprocessing" buttons utilizes multiprocessing to call calculate() function.
Just like with threading multiprocessing doesn't wait for function completion. What statement we have to put in order to make it wait?
"Call via Subprocess" buttons doesn't do anything since I didn't figure out the way to hook subprocess to run internal script function or method. It would be interesting to see how to do it...
Example:
from PyQt4 import QtCore, QtGui
app = QtGui.QApplication(sys.argv)
def calculate(listArg=None):
print '\n\t Starting calculation...'
m=0
for i in range(50000000):
m+=i
print '\t ...calculation completed\n'
class Dialog_01(QtGui.QMainWindow):
def __init__(self):
super(Dialog_01, self).__init__()
myQWidget = QtGui.QWidget()
myBoxLayout = QtGui.QVBoxLayout()
directCall_button = QtGui.QPushButton("Call Directly")
directCall_button.clicked.connect(self.callDirectly)
myBoxLayout.addWidget(directCall_button)
Button_01 = QtGui.QPushButton("Call via Threading")
Button_01.clicked.connect(self.callUsingThreads)
myBoxLayout.addWidget(Button_01)
Button_02 = QtGui.QPushButton("Call via Multiprocessing")
Button_02.clicked.connect(self.callUsingMultiprocessing)
myBoxLayout.addWidget(Button_02)
Button_03 = QtGui.QPushButton("Call via Subprocess")
Button_03.clicked.connect(self.callUsingSubprocess)
myBoxLayout.addWidget(Button_03)
myQWidget.setLayout(myBoxLayout)
self.setCentralWidget(myQWidget)
self.setWindowTitle('Dialog 01')
def callUsingThreads(self):
print '------------------------------- callUsingThreads() ----------------------------------'
import threading
self.myEvent=threading.Event()
self.c_thread=threading.Thread(target=calculate)
self.c_thread.start()
print "\n\t\t : Function End"
def callUsingMultiprocessing(self):
print '------------------------------- callUsingMultiprocessing() ----------------------------------'
from multiprocessing import Pool
pool = Pool(processes=3)
try: pool.map_async( calculate, ['some'])
except Exception, e: print e
print "\n\t\t : Function End"
def callDirectly(self):
print '------------------------------- callDirectly() ----------------------------------'
calculate()
print "\n\t\t : Function End"
def callUsingSubprocess(self):
print '------------------------------- callUsingSubprocess() ----------------------------------'
import subprocess
print '-missing code solution'
print "\n\t\t : Function End"
if __name__ == '__main__':
dialog_1 = Dialog_01()
dialog_1.show()
dialog_1.resize(480,320)
sys.exit(app.exec_())

Use a queue: each thread when completed puts the result on the queue and then you just need to read the appropriate number of results and ignore the remainder:
#!python3.3
import queue # For Python 2.x use 'import Queue as queue'
import threading, time, random
def func(id, result_queue):
print("Thread", id)
time.sleep(random.random() * 5)
result_queue.put((id, 'done'))
def main():
q = queue.Queue()
threads = [ threading.Thread(target=func, args=(i, q)) for i in range(5) ]
for th in threads:
th.daemon = True
th.start()
result1 = q.get()
result2 = q.get()
print("Second result: {}".format(result2))
if __name__=='__main__':
main()
Documentation for Queue.get() (with no arguments it is equivalent to Queue.get(True, None):
Queue.get([block[, timeout]])
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
How to wait until only the first thread is finished in Python
You can to use .join() method too.
what is the use of join() in python threading

I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make you're function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example thats how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!

python multicore queue randomly hangs for no reason, despite queue size being tiny

in python here is my multiprocessing setup. I subclassed the Process method and gave it
a queue and some other fields for pickling/data purposes.
This strategy works about 95% of the time, the other 5% for an unknown reason the queue just hangs and it never finishes (it's common that 3 of the 4 cores finish their jobs and the last one takes forever so I have to just kill the job).
I am aware that queue's have a fixed size in python, or they will hang. My queue only stores one character strings... the id of the processor, so it can't be that.
Here is the exact line where my code halts:
res = self._recv()
Does anyone have ideas? The formal code is below.
Thank you.
from multiprocessing import Process, Queue
from multiprocessing import cpu_count as num_cores
import codecs, cPickle
class Processor(Process):
def __init__(self, queue, elements, process_num):
super(Processor, self).__init__()
self.queue = queue
self.elements = elements
self.id = process_num
def job(self):
ddd = []
for l in self.elements:
obj = ... heavy computation ...
dd = {}
dd['data'] = obj.data
dd['meta'] = obj.meta
ddd.append(dd)
cPickle.dump(ddd, codecs.open(
urljoin(TOPDIR, self.id+'.txt'), 'w'))
return self.id
def run(self):
self.queue.put(self.job())
if __name__=='__main__':
processes = []
for i in range(0, num_cores()):
q = Queue()
p = Processor(q, divided_work(), process_num=str(i))
processes.append((p, q))
p.start()
for val in processes:
val[0].join()
key = val[1].get()
storage = urljoin(TOPDIR, key+'.txt')
ddd = cPickle.load(codecs.open(storage , 'r'))
.. unpack ddd process data ...

Do a time.sleep(0.001) at the beginning of your run() method.

From my experience
time.sleep(0.001)
Is by far not long enough.
I had a similar problem. It seems to happen if you call get() or put() on a queue "too early". I guess it somehow fails to initialize quick enough. Not entirely sure, but I'm speculating that it might have something to do with the ways a queue might use the underlying operating system to pass messages. It started happening to me after I started using BeautifulSoup and lxml and it affected totally unrelated code.
My solution is a little big ugly but it's simple and it works:
import time
def run(self):
error = True
while error:
self.queue.put(self.job())
error = False
except EOFError:
print "EOFError. retrying..."
time.sleep(1)
On my machine it usually retries twice during application start-up and afterwards never again. You need to do that inside of sender AND receiver since this error will occur on both sides.

Different way to implement this threading?

I'm trying out threads in python. I want a spinning cursor to display while another method runs (for 5-10 mins). I've done out some code but am wondering is this how you would do it? i don't like to use globals, so I assume there is a better way?
c = True
def b():
for j in itertools.cycle('/-\|'):
if (c == True):
sys.stdout.write(j)
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
else:
return
def a():
global c
#code does stuff here for 5-10 minutes
#simulate with sleep
time.sleep(2)
c = False
Thread(target = a).start()
Thread(target = b).start()
EDIT:
Another issue now is that when the processing ends the last element of the spinning cursor is still on screen. so something like \ is printed.

You could use events:
http://docs.python.org/2/library/threading.html
I tested this and it works. It also keeps everything in sync. You should avoid changing/reading the same variables in different threads without synchronizing them.
#!/usr/bin/python
from threading import Thread
from threading import Event
import time
import itertools
import sys
def b(event):
for j in itertools.cycle('/-\|'):
if not event.is_set():
sys.stdout.write(j)
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
else:
return
def a(event):
#code does stuff here for 5-10 minutes
#simulate with sleep
time.sleep(2)
event.set()
def main():
c = Event()
Thread(target = a, kwargs = {'event': c}).start()
Thread(target = b, kwargs = {'event': c}).start()
if __name__ == "__main__":
main()
Related to 'kwargs', from Python docs (URL in the beginning of the post):
class threading.Thread(group=None, target=None, name=None, args=(), kwargs={})
...
kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

You're on the right track mostly, except for the global variable. Normally you'd needed to coordinate access to shared data like that with a lock or semaphore, but in this special case you can take a short-cut and just use whether one of the threads is running or not instead. This is what I mean:
from threading import Thread
from threading import Event
import time
import itertools
import sys
def monitor_thread(watched_thread):
chars = itertools.cycle('/-\|')
while watched_thread.is_alive():
sys.stdout.write(chars.next())
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
def worker_thread():
# code does stuff here - simulated with sleep
time.sleep(2)
if __name__ == "__main__":
watched_thread = Thread(target=worker_thread)
watched_thread.start()
Thread(target=monitor_thread, args=(watched_thread,)).start()

This is not properly synchronized. But I will not try to explain it all to you right now because it's a whole lot of knowledge. Try to read this: http://effbot.org/zone/thread-synchronization.htm
But in your case it's not that bad that things aren't synchronized correctyl. The only thing that could happen, is that the spining bar spins a few ms longer than the background task actually needs.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Limiting the Number of Threads while passing arguments - python

Related

Automatically restarting Python sub-processes using identical arguments

How can I check that the Process class from Python Multiprocessing has worked?

Python: How to make program wait till function's or method's completion

python multicore queue randomly hangs for no reason, despite queue size being tiny

Different way to implement this threading?

Categories

Resources