Python - Fork Modules - python

My requirement is to do something like below -
def task_a():
...
...
ret a1
def task_b():
...
...
ret b1
.
.
def task_z():
...
...
ret z1
Now in my main code I want to Execute Tasks a..z in parallel and then wait for the return values of all of the above..
a = task_a()
b = task_b()
z = task_z()
Is there a way to call the above modules in parallel in Python?
Thanks,
Manish

Reference:
Python: How can I run python functions in parallel?
Import:
from multiprocessing import Process
Add new function:
def runInParallel(*fns):
proc = []
for fn in fns:
p = Process(target=fn)
p.start()
proc.append(p)
for p in proc:
p.join()
Input existing functions into the new function:
runInParallel(task_a, task_b, task_c...task_z)

Related

python multiprocessing running same function with different args all stopped except for the last one

I'm executing a function multiple times using multiprocessing. The function iterates part of a directory, depending on number of the CPU cores, and prints name of each file. all processes stopped except for the last one.
Code:
def worker(each_cpu_work_len, cpu_i):
dirs = some_path.files(
)[cpu_i - 1 * each_cpu_work_len:cpu_i * each_cpu_work_len]
for a_file_in_dir in dirs:
name = a_file_in_dir.basename()
print(cpu_i, name)
def main():
dirs_len = len(some_path.files())
each_cpu_work_len = ceil(dirs_len / multiprocessing.cpu_count())
procs = list()
for cpu_i in range(1, multiprocessing.cpu_count() + 1):
proc = multiprocessing.Process(
target=worker,
args=(each_cpu_work_len, cpu_i)
)
procs.append(proc)
proc.start()
for proc in procs:
proc.join()
if __name__ == "__main__":
main()
Results:
4 some_file1.txt
4 some_file2.txt
4 some_file3.txt
4 some_file4.txt
.
.
.
Also same results with multithreading. If you got an answer please also mention how can i implement something like multiprocessing.Manager().list() to pass to these functions. (So that they could append some variables to this server list)

Sharing large arbitrary data structures during multiprocessing

I would like to parallelize a process in python which needs read access to several large, non-array data structures. What would be a recommended way to do this without copying all of the large data structures into every new process?
Thank you
The multiprocessing package provides two ways of sharing state: shared memory objects and server process managers. You should use server process managers as they support arbitrary object types.
The following program makes use of a server process manager:
#!/usr/bin/env python3
from multiprocessing import Process, Manager
# Simple data structure
class DataStruct:
data_id = None
data_str = None
def __init__(self, data_id, data_str):
self.data_id = data_id
self.data_str = data_str
def __str__(self):
return f"{self.data_str} has ID {self.data_id}"
def __repr__(self):
return f"({self.data_id}, {self.data_str})"
def set_data_id(self, data_id):
self.data_id = data_id
def set_data_str(self, data_str):
self.data_str = data_str
def get_data_id(self):
return self.data_id
def get_data_str(self):
return self.data_str
# Create function to manipulate data
def manipulate_data_structs(data_structs, find_str):
for ds in data_structs:
if ds.get_data_str() == find_str:
print(ds)
# Create manager context, modify the data
with Manager() as manager:
# List of DataStruct objects
l = manager.list([
DataStruct(32, "Andrea"),
DataStruct(45, "Bill"),
DataStruct(21, "Claire"),
])
# Processes that look for DataStructs with a given String
procs = [
Process(target = manipulate_data_structs, args = (l, "Andrea")),
Process(target = manipulate_data_structs, args = (l, "Claire")),
Process(target = manipulate_data_structs, args = (l, "David")),
]
for proc in procs:
proc.start()
for proc in procs:
proc.join()
For more information, see Sharing state between processes in the documentation.

python ProcessPoolExecutor do not work when in function

python ProcessPoolExecutor works in command lines but not running after adding to a function
it is working like this
from concurrent import futures
def multi_process(func, paras, threads):
with futures.ProcessPoolExecutor(max_workers=threads) as pool:
res = pool.map(func, paras, chunksize=threads)
return list(res)
p = multi_process(func,paras,threads)
but not working at all as below
def upper(paras,threads):
def func:
some func
def multi_process(func, paras, threads):
with futures.ProcessPoolExecutor(max_workers=threads) as pool:
res = pool.map(func, paras, chunksize=threads)
return list(res)
p = multi_process(func,paras,threads)
return p
p = upper(paras,threads)
no warning or error but without any response for a long time.
Your do get an error. Its.
AttributeError: Can't pickle local object 'upper.<locals>.func'.
The reason is for multiprocessing to work it needs the function to be defined at the global level.
To achieve what you want you can do the following:
from concurrent import futures
# Has to be a global function
def func(para):
print(para)
def upper(paras,threads):
# This cannot be a local function.
#def func(para):
# print(para)
def multi_process(func, paras, threads):
with futures.ProcessPoolExecutor(max_workers=threads) as pool:
res = pool.map(func, paras, chunksize=threads)
return list(res)
p = multi_process(func, paras, threads)
return p
paras = [1, 2, 3]
threads = 3
p = upper(paras,threads)

How to execute a function in parallel?

I am trying to call this function[1] in parallel. Therefore, I have created this function [2], and I call it like this [3][4]. The problem is that when I execute this code, the execution hangs and I never see the result, but if I execute run_simple_job in serial, everything goes ok. Why I can't execute this function in parallel? Any advice for that?
[1] function that I am trying to call
#make_verbose
def run_simple_job(job_params):
"""
Execute a job remotely, and get the digests.
The output will come as a json file and it contains info about the input and output path, and the generated digest.
:param job_params: (namedtuple) contains several attributes important for the job during execution.
client_id (string) id of the client.
command (string) command to execute the job
cluster (string) where the job will run
task_type (TypeTask) contains information about the job that will run
should_tamper (Boolean) Tells if this job should tamper the digests or not
:return : output (string) the output of the job execution
"""
client_id = job_params.client_id
_command = job_params.command
cluster = job_params.cluster
task_type = job_params.task_type
output = // execute job
return output
[2] function that calls in parallel
def spawn(f):
# 1 - how the pipe and x attributes end up here?
def fun(pipe, x):
pipe.send(f(x))
pipe.close()
return fun
def parmap2(f, X):
pipe = [Pipe() for x in X]
# 2 - what is happening with the tuples (c,x) and (p, c)?
proc = [Process(target=spawn(f), args=(c, x))
for x, (p, c) in izip(X, pipe)]
for p in proc:
logging.debug("Spawn")
p.start()
for p in proc:
logging.debug("Joining")
p.join()
return [p.recv() for (p, c) in pipe]
[3] Wrapper class
class RunSimpleJobWrapper:
""" Wrapper used when running a job """
def __init__(self, params):
self.params = params
[4] How I call the function to run in parallel
for cluster in clusters:
task_type = task_type_by_cluster[cluster]
run_wrapper_list.append(RunSimpleJobWrapper(get_job_parameter(client_id, cluster, job.command, majority(FAULTS), task_type)))
jobs_output = parmap2(run_simple_job_wrapper, run_wrapper_list)
You could simply use multiprocessing:
from multiprocessing import Pool
n_jobs = -1 # use all the available CPUs
pool = Pool(n_jobs)
param_list = [...] # generate a list of your parameters
results = pool.map(run_simple_job,param_list)

Why import instruction is not executed into python Process

I'm looking for a solution to do multiprocessing for running script.
I have a function which launches 4 process, and each process executes a script through runpy.run_path() and I get return back.
Example :
def valorise(product, dico_valo):
res = runpy.run_path(product +"/PyScript.py", run_name="__main__")
dico_valo[product] = res["ret"]
def f(mutex,l,dico):
while len(l)!= 0:
mutex.acquire()
product = l.pop(0)
mutex.release()
p = Process(target=valorise, args=(product,dico))
p.start()
p.join()
def run_parallel_computations(valuationDate, list_scripts):
if len(product_list)>0:
print '\n\nPARALLEL COMPUTATIONS BEGIN..........\n\n'
manager = Manager()
l = manager.list(list_scripts)
dico = manager.dict()
mutex = Lock()
p1 = Process(target=f, args=(mutex,l,dico), name="script1")
p2 = Process(target=f, args=(mutex,l,dico), name="script2")
p3 = Process(target=f, args=(mutex,l,dico), name="script3")
p4 = Process(target=f, args=(mutex,l,dico), name="script4")
p1.start()
p2.start()
p3.start()
p4.start()
p1.join()
p2.join()
p3.join()
p4.join()
dico_isin = {}
for i in iter(dico.keys()):
dico_isin[i] = dico[i]
return dico
print '\n\nPARALLEL COMPUTATIONS END..........'
else:
print '\n\nNOTHING TO PRICE !'
In every PyScript.py, I import a library and each script has to import again it. However, in this case, it doesn't work as I want and I don't understand why. My library is imported once during the first process and the same "import" is used in the other processes.
Could you help me ?
Thank you !
It might not be the case in multiprocessing (but looks like it is).
When you will try to import something more than once (ie. import re in most of your modules), Python will not 'reimport' it. As it will see it in modules already imported and will skip it.
To force reloading you can try reload(module_name) (it can not reload import of single class/method from module, you can reload whole module or nothing)

Categories

Resources