Simple example of Multiprocessing and multi-threading - python

I have the following code:
class SplunkUKAnalyser(object):
def __init__
def method1
def method2
def method2
...
class SplunkDEAnalyser(SplunkUKAnalyser):
def __init__ (Over-ridden)
def method1 (Over-ridden)
def method2
def method2
...
perform_uk_analysis():
my_uk_analyser = SplunkUKAnalyser()
perform_de_analysis():
my_de_analyser = SplunkDEAnalyser()
It all works well if I just execute the below:
perform_uk_analysis()
perform_de_analysis()
How can I make it so that the two last statements are executed concurrently. (using mutliprocessing and/or multi-threading)?
From my test it seems that the second statement executes even though the first statement has not finished completely but I would like to incorporate true concurrency.
Any other additional advice is much appreciated.
Many thanks in advance.

Because of GIL (Global Interpreter Lock) you can not achieve 'true concurrency' with threading.
However, using multiprocessing to concurrently run multiple tasks is easy:
import multiprocessing
process1 = multiprocessing.Process(target=perform_uk_analysis)
process2 = multiprocessing.Process(target=perform_de_analysis)
# you can optionally daemoize the process
process2.daemon = True
# run the tasks concurrently
process1.start()
process2.start()
# you can optionally wait for a process to finish
process2.join()
For tasks that run the same function with different arguments, consider using multiprocessing.Pool, an even more convenient solution.

Related

Will this cause a deadlock or a bad pattern?

Will the following way of using a thread pool cause a deadlock? Or is such a pattern not preferred? If so, what is the alternative.
Passing pool to a function that is run in a thread, which in turn invokes a function that is run the same pool.
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def bar():
sleep(2)
return 2
def foo(pool):
sleep(2)
my_list = [pool.submit(bar) for i in range(4)]
return [i.result() for i in my_list]
pool = ThreadPoolExecutor(10)
my_list = [pool.submit(foo, pool) for i in range(2)]
for i in my_list:
print(i.result())
This would be a safe way to spawn a thread from within a thread that itself was initiated by ThreadPoolExecutor. This may not be necessary if ThreadPoolExecutor itself is thread-safe. The output shows how, in this case, there would be 10 concurrent threads.
from concurrent.futures import ThreadPoolExecutor
from time import sleep
BAR_THREADS = 4
FOO_THREADS = 2
def bar(_):
print('Running bar')
sleep(1)
def foo(_):
print('Running foo')
with ThreadPoolExecutor(max_workers=BAR_THREADS) as executor:
executor.map(bar, range(BAR_THREADS))
with ThreadPoolExecutor(max_workers=FOO_THREADS) as executor:
executor.map(foo, range(FOO_THREADS))
print('Done')
Output:
Running foo
Running foo
Running bar
Running bar
Running bar
Running bar
Running bar
Running bar
Running bar
Running bar
Done
Will the following way of using a thread pool cause a deadlock? ... If so, what is the alternative?
One alternative would be to use a thread pool that does not have a hard limit on the number of workers. Unfortunately, the concurrent.futures.ThreadPoolExecutor class is not so sophisticated. You either would have to write your own, or else find one provided by a third party. (I'm not a big-time Python programmer, so I don't know of one off-hand.)
A naive alternative thread-pool might create a new worker any time submit() was called and all of the existing workers were busy. On the other hand, that could make it easy for you to run the program out of memory by creating too many threads. A slightly more sophisticated thread pool might also kill off a worker if too many other workers were idle at the moment when the worker completed its task.
More sophisticated strategies are possible, but you might have to think more deeply about the needs and patterns-of-use of the application before writing the code.

Python: CPU intensive tasks on multiple threads

Suppose I have this class:
class Foo:
def __init__(self):
self.task1_dict = {}
self.task2_dict = {}
def task1(self):
for i in range(10000000):
# update self.task1_dict
def task2(self):
for i in range(10000000):
# update self.task2_dict
def run(self):
self.task1()
self.task2()
Task 1 and task 2 are both CPU intensive tasks and are non-IO. They are also independent so you can assume that running them concurrently is thread safe.
For now, my class is running the tasks sequentially and I want to change it so the tasks are run in parallel in multiple threads. I'm using the ThreadPoolExecutor from the concurrent.future package.
class Foo:
...
def run(self):
with ThreadPoolExecutor() as executor:
executor.submit(self.task1)
executor.submit(self.task2)
The problem is when I call the run method the run time does not decrease at all and even slightly increases compared to the sequential version. I'm guessing that this is because of the GIL allowing only one thread to run at a time. Is there any way that I can parallelise this program? Maybe a way to overcome the GIL and run the 2 methods on 2 threads? I have considered switching to ProcessPoolExecutor, but I cannot call the methods since class methods are not picklable. Also if I use multiprocessing, Python will create multiple instances of Foo and self.task1_dict and self.task2_dict would not be updated accordingly.
You can use multiprocessing shared memory as explained here

Process becomes zombie - Python3 Multiprocessing

First of all I shall explain the structure of my scripts:
Script1 --(call)--> Script2 function and this function further calls 12-15 functions in different scripts through multiprocessing, and in those functions some have infinite loops and some have threading in it.
The function that contains threading further calls some functions having infinite loops in them.
The code of multiprocessing is just like:
Script1.py
def func1():
#some infinite functionality
Script2.py
def func2():
#some infinite functionality
Script3.py
def func3():
#Thread calling
#..
#..
#and so on...
from multiprocessing import Process
from Script1 import func1
from Script2 import func2
from Script3 import func3
process=[]
process.append(Process(target=func1))
process.append(Process(target=func2))
process.append(Process(target=func3))
process.append(Process(target=func4))
process.append(Process(target=func5))
process.append(Process(target=func6))
# ..
# ..
# and so on.
for p in process:
p.start()
print("process started:", p)
for p in process:
p.join()
print("process joined:", p)
Now the issues I faced are:
It only prints the first joined print statement, but all processes started successfully.(It means join for all processes not executed).
Second, is some processes becomes zombie, when run with multiprocessing but without multiprocessing when functions run externally works very well(so here I know that join cannot work properly due to some issue, our functions are fine).
Third, in my situation should I used multiprocessing. Pools or multiprocessing. Process works fine for me? Any suggestion?
Fourth, I want to know, if there are any alternatives to this except multithreading and mutliprocessing. Pools maybe?
I also tried a different thing but did not work like:
set_start_context('spawn')
Because it individually runs every process externally like we run scripts through popen externally, according to my knowledge
Note:
I am using Ubuntu 16.04. I have a scenario where I can use multiprocessing. Process, multiprocessing. Pool and something that works like multiprocessing, but cannot use multithreading here.

basic multiprocessing with python

I have found information on multiprocessing and multithreading in python but I don't understand the basic concepts and all the examples that I found are more difficult than what I'm trying to do.
I have X independent programs that I need to run. I want to launch the first Y programs (where Y is the number of cores of my computer and X>>Y). As soon as one of the independent programs is done, I want the next program to run in the next available core. I thought that this would be straightforward, but I keep getting stuck on it. Any help in solving this problem would be much appreciated.
Edit: Thanks a lot for your answers. I also found another solution using the joblib module that I wanted to share. Suppose that you have a script called 'program.py' that you want to run with different combination of the input parameters (a0,b0,c0) and you want to use all your cores. This is a solution.
import os
from joblib import Parallel, delayed
a0 = arange(0.1,1.1,0.1)
b0 = arange(-1.5,-0.4,0.1)
c0 = arange(1.,5.,0.1)
params = []
for i in range(len(a0)):
for j in range(len(b0)):
for k in range(len(c0)):
params.append((a0[i],b0[j],c0[k]))
def func(parameters):
s = 'python program.py %g %g %g' % parameters[0],parameters[1],parameters[2])
command = os.system(s)
return command
output = Parallel(n_jobs=-1,verbose=1000)(delayed(func)(i) for i in params)
You want to use multiprocessing.Pool, which represents a "pool" of workers (default one per core, though you can specify another number) that do your jobs. You then submit jobs to the pool, and the workers handle them as they become available. The easiest function to use is Pool.map, which runs a given function for each of the arguments in the passed sequence, and returns the result for each argument. If you don't need return values, you could also use apply_async in a loop.
def do_work(arg):
pass # do whatever you actually want to do
def run_battery(args):
# args should be like [arg1, arg2, ...]
pool = multiprocessing.Pool()
ret_vals = pool.map(do_work, arg_tuples)
pool.close()
pool.join()
return ret_vals
If you're trying to call external programs and not just Python functions, use subprocess. For example, this will call cmd_name with the list of arguments passed, raise an exception if the return code isn't 0, and return the output:
def do_work(subproc_args):
return subprocess.check_output(['cmd_name'] + list(subproc_args))
Hi i'm using the object QThread from pyqt
From what i understood, your thread when he is running can only use his own variable and proc, he cannot change your main object variables
So before you run it be sur to define all the qthread variables you will need
like this for example:
class worker(QThread)
def define(self, phase):
print 'define'
self.phase=phase
self.start()#will run your thread
def continueJob(self):
self.start()
def run(self):
self.launchProgramme(self.phase)
self.phase+=1
def launchProgramme(self):
print self.phase
i'm not well aware of how work the basic python thread but in pyqt your thread launch a signal
to your main object like this:
class mainObject(QtGui.QMainWindow)
def __init__(self):
super(mcMayaClient).__init__()
self.numberProgramme=4
self.thread = Worker()
#create
self.connect(self.thread , QtCore.SIGNAL("finished()"), self.threadStoped)
self.connect(self.thread , QtCore.SIGNAL("terminated()"), self.threadStopped)
connected like this, when the thread.run stop, it will launch your threadStopped proc in your main object where u can get the value of your thread Variables
def threadStopped(self):
value=self.worker.phase
if value<self.numberProgramme:
self.worker.continueJob()
after that you just have to lauch another thread or not depending of the value you get
This is for pyqt threading of course, in python basic thread, the way to execute the def threadStopped could be different.

simplifying threading in python

I am looking for a way to ease my threaded code.
There are a lot of places in my code where I do something like:
for arg in array:
t=Thread(lambda:myFunction(arg))
t.start()
i.e running the same function, each time for different parameters, in threads.
This is of course a simplified version of the real code, and usually the code inside the for loop is ~10-20 lines long, that cannot be made simple by using one auxiliary function like myFunction in the example above (had that been the case, I could've just used a thread pool).
Also, this scenario is very, very common in my code, so there are tons of lines which I consider redundant. It would help me a lot if I didn't need to handle all this boilerplate code, but instead be able to do something like:
for arg in array:
with threaded():
myFunction(arg)
i.e somehow threaded() takes every line of code inside it and runs it in a separate thread.
I know that context managers aren't supposed to be used in such situations, that it's probably a bad idea and will require an ugly hack, but nonetheless - can it be done, and how?
How about this:
for arg in array:
def _thread():
# code here
print arg
t = Thread(_thread)
t.start()
additionally, with decorators, you can sugar it up a little:
def spawn_thread(func):
t = Thread(func)
t.start()
return t
for arg in array:
#spawn_thread
def _thread():
# code here
print arg
Would a thread pool help you here? Many implementations for Python exist, for example this one.
P.S: still interested to know what your exact use-case is
What you want is a kind of "contextual thread pool".
Take a look at the ThreadPool class in this module, designed to be used similar to the manner you've given. Use would be something like this:
with ThreadPool() as pool:
for arg in array:
pool.add_thread(target=myFunction, args=[arg])
Failures in any task given to a ThreadPool will flag an error, and perform the standard error backtrace handling.
I think you're over-complicating it. This is the "pattern" I use:
# util.py
def start_thread(func, *args):
thread = threading.Thread(target=func, args=args)
thread.setDaemon(True)
thread.start()
return thread
# in another module
import util
...
for arg in array:
util.start_thread(myFunction, arg)
I don't see the big deal about having to create myFunction. You could even define the function inline with the function that starts it.
def do_stuff():
def thread_main(arg):
print "I'm a new thread with arg=%s" % arg
for arg in array:
util.start_thread(thread_main, arg)
If you're creating a large number of threads, a thread pool definitely makes more sense. You can easily make your own with the Queue and threading modules. Basically create a jobs queue, create N worker threads, give each thread a "pointer" to the queue and have them pull jobs from the queue and process them.

Categories

Resources