How to run Pytorch model in normal non-parallel way?

How to run Pytorch model in normal non-parallel way? - python

I am going through this script, and there is a code block which takes 2 options into account, DataParallel and DistributedDataParallel here:
if not args.distributed:
if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
model.features = torch.nn.DataParallel(model.features)
model.cuda()
else:
model = torch.nn.DataParallel(model).cuda()
else:
model.cuda()
model = torch.nn.parallel.DistributedDataParallel(model)
What if I don't want either of these options, and I want to run it without even DataParallel. How do I do it?
How do I define my model so that it runs as a plain nn and not parallelizing anything?

DataParallel is a wrapper object to parallelize the computation on multiple GPUs of the same machine, see here.
DistributedDataParallel is also a wrapper object that lets you distribute the data on multiple devices, see here.
If you don't want it, you can simply remove the wrapper and use the model as it is:
if not args.distributed:
if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
model.features = model.features
model.cuda()
else:
model = model.cuda()
else:
model.cuda()
model = model
This is to keep code modification to a minimum. Of course, since parallelization is of no interest to you, you could drop this whole if statement to something along the lines of:
if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
model.features = model.features
model = model.cuda()
Note that this code assumes you are running on the GPU.

DataParallel is a wrapper you can bypass it and get just the original module by doing this:
my_model = model.module.to(device)

Related

SwishImplementation error when saving jit trace

I am trying to jit trace and save my pytorch model from the segmentation models package. But I am getting an error. "Could not export Python function call 'SwishImplementation'. Remove calls to python functions before export. Did you forget to add #script or #scrript_method annotation? If this is a nn.ModuleList, add it to _ constants _" It only happens when I use the efficientnet backbone. How can I get the save() function to work? I need to be able to use the model in a c++ application.
import torch
import segmentation_models_pytorch as smp
model = smp.Unet('efficientnet-b7')
model.eval()
input = torch.randn((1,3,224,224))
torch_out = model(input)
model = torch.jit.trace(model,input)
trace_out = model(input)
model.save('model.pt')

The UNET model from the segmentation_models_pytorch module uses an EfficientNet, which uses a MemoryEfficientSwish module. To fix the error, change all instances of MemoryEfficientSwish to Swish before saving the model.
You can iterate through the UNET model, and if the module is an instance of EfficientNet, call the function .set_swish(memory_efficient = False).
After that, you can load the state_dict, and then trace and save the model.

where should I change the code to tell PyTorch not to use the GPU?

As mentioned in How to tell PyTorch to not use the GPU?, in order to tell PyTorch not to use the GPU you should change a few lines inside PyTorch code.
Where should I make the change?
Where is the line of code that needs to be modified?
I tried to find it but couldn't...

Using the method .cpu() on any tensor or pytorch module transfers the component to the cpu so that the calculations will be made using it.
Another direction is to use the method .to("cpu"). Alternatively, you can change "cpu" with the name of other devices such as "cuda".
Example:
a)
model = MyModel().cpu() # move the model to the cpu
x = data.cpu() # move the input to the cpu
y = model(x)
b)
model = MyModel().to('cpu') # move the model to the cpu
x = data.to('cpu') # move the input to the cpu
y = model(x)

How to do inference in parallel with tensorflow saved model predictors?

Tensorflow version: 1.14
Our current setup is using tensorflow estimators to do live NER i.e. perform inference one document at a time. We have 30 different fields to extract, and we run one model per field, so got total of 30 models.
Our current setup uses python multiprocessing to do the inferences in parallel. (The inference is done on CPUs.) This approach reloads the model weights each time a prediction is made.
Using the approach mentioned here, we exported the estimator models as tf.saved_model. This works as expected in that it does not reload the weights for each request. It also works fine for a single field inference in one process, but doesn't work with multiprocessing. All the processes hang when we make the predict function (predict_fn in the linked post) call.
This post is related, but not sure how to adapt it for saved model.
Importing tensorflow individually for each of the predictors did not work either:
class SavedModelPredictor():
def __init__(self, model_path):
import tensorflow as tf
self.predictor_fn = tf.contrib.predictor.from_saved_model(model_path)
def predictor_fn(self, input_dict):
return self.predictor_fn(input_dict)
How to make tf.saved_model work with multiprocessing?

Ray Serve, ray's model serving solution, also support offline batching. You can wrap your model in Ray Serve's backend and scale it to the number replica you want.
from ray import serve
client = serve.start()
class MyTFModel:
def __init__(self, model_path):
self.model = ... # load model
#serve.accept_batch
def __call__(self, input_batch):
assert isinstance(input_batch, list)
# forward pass
self.model([item.data for item in input_batch])
# return a list of response
return [...]
client.create_backend("tf", MyTFModel,
# configure resources
ray_actor_options={"num_cpus": 2, "num_gpus": 1},
# configure replicas
config={
"num_replicas": 2,
"max_batch_size": 24,
"batch_wait_timeout": 0.5
}
)
client.create_endpoint("tf", backend="tf")
handle = serve.get_handle("tf")
# perform inference on a list of input
futures = [handle.remote(data) for data in fields]
result = ray.get(futures)
Try it out with the nightly wheel and here's the tutorial: https://docs.ray.io/en/master/serve/tutorials/batch.html
Edit: updated the code sample for Ray 1.0

Ok, so the approach outlined in this answer with ray worked.
Built a class like this, which loads the model on init and exposes a function run to perform prediction:
import tensorflow as tf
import ray
ray.init()
#ray.remote
class MyModel(object):
def __init__(self, field, saved_model_path):
self.field = field
# load the model once in the constructor
self.predictor_fn = tf.contrib.predictor.from_saved_model(saved_model_path)
def run(self, df_feature, *args):
# ...
# code to perform prediction using self.predictor_fn
# ...
return self.field, list_pred_string, list_pred_proba
Then used the above in the main module as:
# form a dictionary with key 'field' and value MyModel
model_dict = {}
for field in fields:
export_dir = f"saved_model/{field}"
subdirs = [x for x in Path(export_dir).iterdir()
if x.is_dir() and 'temp' not in str(x)]
latest = str(sorted(subdirs)[-1])
model_dict[field] = MyModel.remote(field, latest)
Then used the above model dictionary to do predictions like this:
results = ray.get([model_dict[field].run.remote(df_feature) for field in fields])
Update:
While this approach works, found that running estimators in parallel with multiprocessing is faster than running predictors in parallel with ray. This is especially true for large document sizes. It looks like the predictor approach might work well for small number of dimensions and when the input data is not large. Maybe an approach like mentioned here might be better for our use case.

Tensoflow Estimator: how to use tf.graph_util.convert_variables_to_constants

I would like to know if it is possible to use the function tf.graph_util.convert_variables_to_constants (in order to store the frozen version of the graph) in a train/evaluation loop, while I'm using a custom estimators. For example:
best_validation_accuracy = -1
for _ in range(steps // how_often_validation):
# Train the model
estimator.train(input_fn=train_input_fn, steps=how_often_validation)
# Evaluate the model
validation_accuracy = estimator.evaluate(input_fn=eval_input_fn)
# Save best model
if validation_accuracy["accuracy"] > best_validation_accuracy:
best_validation_accuracy = validation_accuracy["accuracy"]
# Save best model perfomances
# I WANT TO USE tf.graph_util.convert_variables_to_constants HERE

To use the function tf.graph_util.convert_variables_to_constants, you need the graph and the session of your model.
After going through the TensorFlow code defining the estimators, it appears that:
This code is deprecated,
The graph is created on the fly and not easily accessible (at least, I was not able to retrieve it).
Thus, we will have to use the good old method.
When you call estimator.train, checkpoints of your model are being saved in a specified directory (estimator.model_dir). You can use those files to access the graph and session and freeze the variables as follow:
1. Load meta graph
saver = tf.train.import_meta_graph('/path/to/meta')
2. Load weights
sess = tf.Session
saver.restore(sess, '/path/to/weights')
3. Freeze variables
tf.graph_util.convert_variables_to_constants(sess,
sess.graph.as_graph_def(),
['output'])

Tensorflow - Using tf.summary with 1.2 Estimator API

I'm trying to add some TensorBoard logging to a model which uses the new tf.estimator API.
I have a hook set up like so:
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
summary_op=tf.summary.merge_all())
# ...
classifier.train(
input_fn,
steps=1000,
hooks=[summary_hook])
In my model_fn, I am also creating a summary -
def model_fn(features, labels, mode):
# ... model stuff, calculate the value of loss
tf.summary.scalar("loss", loss)
# ...
However, when I run this code, I get the following error from the summary_hook:
Exactly one of scaffold or summary_op must be provided. This is probably because tf.summary.merge_all() is not finding any summaries and is returning None, despite the tf.summary.scalar I declared in the model_fn.
Any ideas why this wouldn't be working?

Use tf.train.Scaffold() and pass tf.merge_all as following
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
scaffold=tf.train.Scaffold(summary_op=tf.summary.merge_all()))

Just for whoever have this question in the future, the selected solution doesn't work for me (see my comments in the selected solution).
Actually, with TF 1.2 Estimator API, one doesn't need to have summary_hook. I just have tf.summary.scalar("loss", loss) in the model_fn, and run the code without summary_hook. The loss is recorded and shown in the tensorboard. I'm not sure if TF API was changed after this and similar questions.

with Tensorflow ver-r1.3
Add your summary ops in your estimator model_fn
example :
tf.summary.histogram(tensorOp.name, tensorOp)
If you feel writing summaries may consume time and space, you can control the writing frequency of summaries, in your Estimator run_config
run_config = tf.contrib.learn.RunConfig()
run_config = run_config.replace(model_dir=FLAGS.model_dir)
run_config = run_config.replace(save_summary_steps=150)
Note: this will affect the overall summary writer frequency for TensorBoard logging, of your estimator (tf.estimator.Estimator)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to run Pytorch model in normal non-parallel way? - python

DataParallel is a wrapper you can bypass it and get just the original module by doing this: my_model = model.module.to(device)

Related

SwishImplementation error when saving jit trace

where should I change the code to tell PyTorch not to use the GPU?

How to do inference in parallel with tensorflow saved model predictors?

Tensoflow Estimator: how to use tf.graph_util.convert_variables_to_constants

Tensorflow - Using tf.summary with 1.2 Estimator API

Categories

Resources