Assuming I have a forward function that looks something like this:
def forward(self, input_data):
x = self.net1(input_data)
y = self.net2(input_data)
out = torch.cat(x,y,dim=1)
return out
where net1 and net2 are defined in my module as
self.net1 = nn.Sequential(<Multiple layers>)
self.net2 = nn.Sequential(<Multiple layers>)
How can I run x and y calculations in parallel on the same\multiple GPUS? It seems time consuming to wait for each operation to execute if they are independent of each other. Normally I would have used Python's multithreading or multiprocessing modules however I don't think this is the correct way here.
I know that DATA PARALLELISM is used to run training in parallel on multiple GPUs but I could find out how to execute only a part of the forward function in parallel.
Related
I am running a program that uses [a lot of] looping to perform a type of bootstrap analysis. The computational step can take a long time (60+) seconds, which means that collectively this program could take almost a week to run serially. I have tried using multiprocessing on a high performance cluster - using concurrent.futures - but the analysis is just as slow. I know that misusing multiprocessing can lead to same/worse performance than serial, so I'm wondering at what point in my code this happens.
My analysis program looks like this, serially:
def analysis(dataset, subset_sizes, n_iters):
for size in subset_sizes:
for _ in range(n_iters):
subsampled_dataset = random_sample(dataset, size)
computation(subsampled_dataset)
for dataset in datasets:
for _ in range(n_trials):
analysis(dataset, **kwargs)
using multiprocessing, it looks like this:
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = []
for dataset in datasets: # here, datasets looks like [datasetN_trialN, ...] to use one loop
futures.append(executor.submit(analysis, **kwargs))
results = [f.result() for f in concurrent.futures.as_completed(futures)]
Should I be calling executor.submit at a different point in my code?
Hi I don't feel like I have quite understood multiprocessing in python correctly.
I want to run a function called 'run_worker' (which is simply code that runs and manages a subprocess) 20 times in parallel and wait for all the functions to complete. Each run_worker should run on a separate core/thread. I don' mind what order the processes complete hence i used async and i dont have a return value so i used map
I thought that I should use:
if __name__ == "__main__":
num_workers = 20
param_map = []
for i in range(num_workers):
param_map += [experiment_id]
pool = mp.Pool(processes= num_workers)
pool.map_async(run_worker, param_map)
pool.close()
pool.join()
However this code exits straight away and doesn't appear to execute run_worker properly. Also do I really have to create a param_map of the same experiment_id to pass to the worker because this seems like a hack to get the number of run_workers created. Ideally i would like to run a function with no parameters and no return value over multiple cores.
Note I am using windows 2019 server in AWS.
edit added run_worker which calls a subprocess which write to file:
def run_worker(experiment_id):
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
suggestion = conn.experiments(experiment.id).suggestions().create()
value = evaluate_model(suggestion.assignments)
conn.experiments(experiment_id).observations().create(suggestion=suggestion.id,value=value,metadata=dict(hostname=hostname),)
# Update the experiment object
experiment = conn.experiments(experiment_id).fetch()
It seems that for this simple purpose you can better be using pool.map instead of pool.map_async. They both run in parallel, however pool.map is blocking until all operations are finished (see also this question). pool.map_async is especially meant for situations like this:
result = map_async(func, iterable)
while not result.ready():
// do some work while map_async is running
pass
// blocking call to get the result
out = result.get()
Regarding your question about the parameters, the fundamental idea of a map operation is to map the values of one list/array/iterable to a new list of values of the same size. As far as I can see in the docs, multiprocessing does not provide any method to run multiple functions without parameters.
If you would also share your run_worker function, that might help to get better answers to your question. That might also clear up why you would run a function without any arguments and return values using a map operation in the first place.
I have a pyspark code that have 3 functions. The first function is loads some data and prepares it for other two functions. The other two functions takes this output and do some task and generate respective outputs.
So the code will look something like this,
def first_function():
# load data
# pre-process
# return pre-processed data
def second_function(output_of_first_function):
# tasks for second function
# return output
def third_function(output_of_first_function):
# tasks for third function
# return output
And these functions are called from a main function like this,
def main():
output_from_first_function = first_function()
output_from_second_function = second_function(output_from_first_function)
output_from_third_function = third_function(output_from_first_function)
There is no interdependence among second_function and third_function. I'm looking for a way to run these two functions in parallel at a same time. There are some transforms happening inside these functions. So it may help to help these functions in parallel.
How to run the second_function and third_function in parallel? Should each of these functions create their own spark context or can they share a spark context?
From your problem, it doesn't seems like you really need pyspark. I think you should consider using Python Threads library. As described in this post: How to run independent transformations in parallel using PySpark?
I have a large python script (an economic model with rows > 1500) which I want to excecute in parallel on several cpu cores. All the examples for multiprocessing I found so far were about simple functions, but not whole scripts. Could you please give me a hint how to achieve this?
Thanks!
Clarification: the model generates as an output a dataset for a multitude of variables. Each result is randomly different from the other model runs. Therefore I have to run the model often enough till some deviation measure is achieved (let's say 50 times). Model input is allways the same, but not the output.
Edit, got it:
import os
from multiprocessing import Pool
n_cores = 4
n_iterations = 5
def run_process(process):
os.system('python myscript.py')
if __name__ == '__main__':
p = Pool(n_cores)
p.map(run_process, range(n_iterations))
If you want to use a pool of workers, I usually do the following.
import multiprocessing as mp
def MyFunctionInParallel(foo, bar, queue):
res = foo + bar
queue.put({res: res})
return
if __name__ == '__main__':
data = []
info = {}
num =
ManQueue = mp.Manager().Queue()
with mp.Pool(processes=numProcs) as pool:
pool.starmap(MyFunctionInParallel, [(data[v], info, ManQueue)
for v in range(num)])
resultdict = {}
for i in range(num):
resultdict.update(ManQueue.get())
To be clearer, your script becomes the body of MyFunctionInParallel. This means that you need to slightly change your script so that the variables which depend on your input (i.e. each of your models) can be passed as arguments to MyFunctionInParallel. Then, depending on what you want to do with the results you get for each run, you can either use a Queue as sketched above or for example, write your results in a file. If you use a Queue, it means that you want to be able to retrieve your data at the end of the parallel execution (i.e. in the same script execution), and I would advise to use dictionaries as a way to store your results in the Queue, as they are very flexible on the data they can contain. On the other hand, writing up your results in a file is I guess better if you wish to share them with other users/applications. You have to be careful with concurrent writing from all the workers, so as to produce a meaningful output, but writing one file per model can also be OK.
For the main part of the code, num would be the number of models you will be running, data and info some parameters which are specific (or not) to each model and numProcs the number of processes that you wish to launch. For the call to starmap, it will basically map the arguments in the list comprehension to each call of MyFunctionInParallel, allowing each execution to have different input arguments.
I would like to use multiple processes (not threads) to do some preprocessing and enqueue the results to a tf.RandomShuffleQueue which can be used by my main graph for training.
Is there a way to do that ?
My actual problem
I have converted my dataset into TFRecords split across 256 shards. I want to start 20 processes using multiprocessing and let each process a range of shards. Each process should read images and then augment them and push them into a tf.RandomShuffleQueue from which the input can be given to a graph for training.
Some people advised me to go through the inception example in tensorflow. However, it is a very different situation because there only reading of the data shards is done by multiple threads (not processes), while the preprocessing (e.g - augmentation) takes place in the main thread.
(This aims to solve your actual problem)
In another topic, someone told you that Python has the global interpreter lock (GIL) and therefore there would be no speed benefits from multi-core, unless you used multiple processes.
This was probably what prompted your desire to use multiprocessing.
However, with TF, Python is normally used only to construct the "graph". The actual execution happens in native code (or GPU), where GIL plays no role whatsoever.
In light of this, I recommend simply letting TF use multithreading. This can be controlled using the intra_op_parallelism_threads argument, such as:
with tf.Session(graph=graph,
config=tf.ConfigProto(allow_soft_placement=True,
intra_op_parallelism_threads=20)) as sess:
# ...
(Side note: if you have, say, a 2-CPU, 32-core system, the best argument may very well be intra_op_parallelism_threads=16, depending on a lot of factors)
Comment: The pickling of TFRecords is not that important.
I can pass a list of lists containing names of ranges of sharded TFRecord files.
Therebe I have to restart Decision process!
Comment: I can pass it to a Pool.map() as an argument.
Verify, if a multiprocesing.Queue() can handle this.
Results of Tensor functions are a Tensor object.
Try the following:
tensor_object = func(TFRecord)
q = multiprocessing.Manager().Queue()
q.put(tensor_object)
data = q.get()
print(data)
Comment: how do I make sure that all the processes enqueue to the same queue ?
This is simple done enqueue the results from Pool.map(...
after all process finished.
Alternate we can enqueue parallel, queueing data from all processes.
But doing so, depends on pickleabel data as described above.
For instance:
import multiprocessing as mp
def func(filename):
TFRecord = read(filename)
tensor_obj = tf.func(TFRecord)
return tensor_obj
def main_Tensor(tensor_objs):
tf = # ... instantiat Tensor Session
rsq = tf.RandomShuffleQueue(...)
for t in tensor_objs:
rsq.enqueue(t)
if __name__ == '__main__':
sharded_TFRecords = ['file1', 'file2']
with mp.Pool(20) as pool:
tensor_objs = pool.map(func, sharded_TFRecords)
pool.join()
main_Tensor(tensor_objs)
It seems the recommended way to run TF with multiprocessing is via creating a separate tf.Session for each child as sharing it across processes is unfeasible.
You can take a look at this example, I hope it helps.
[EDIT: Old answer]
You can use a multiprocessing.Pool and rely on its callback mechanism to put results in the tf.RandomShuffleQueue as soon as they are ready.
Here's a very simple example on how to do it.
from multiprocessing import Pool
class Processor(object):
def __init__(self, random_shuffle_queue):
self.queue = random_shuffle_queue
self.pool = Pool()
def schedule_task(self, task):
self.pool.apply_async(processing_function, args=[task], callback=self.task_done)
def task_done(self, results):
self.queue.enqueue(results)
This assumes Python 2, for Python 3 I'd recommend to use a concurrent.futures.ProcessPoolExecutor.