Python multiprocessing with TensorRT

Python multiprocessing with TensorRT - python

I am trying to use a TensorRT engine for inference in a python class that inherits from multiprocessing. The engine works in a standalone python script on my system, but now while integrating it into the codebase, the multiprocessing used in the class seems to be causing problems.
I am not getting any errors. It just skips everything after the line self.runtime = trt.Runtime(self.trt_logger). My debugger from vscode does not go into the function either.
In the docs the following is mentioned, that I do not fully understand:
The TensorRT builder may only be used by one thread at a time. If you
need to run multiple builds simultaneously, you will need to create
multiple builders. The TensorRT runtime can be used by multiple
threads simultaneously, so long as each object uses a different
execution context.
The following parts of my code are started, joined and terminated from another file:
# more imports
import logging
import multiprocessing
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
class MyClass(multiprocessing.Process):
def __init__(self, messages):
multiprocessing.Process.__init__(self)
# other stuff
self.exit = multiprocessing.Event()
def load_tensorrt_model(self, config):
'''Load tensorrt model with engine'''
logging.debug('Start')
# Reading the config parameters related to the engine
engine_file = config['trt_engine']['trt_folder'] + os.path.sep + config['trt_engine']['engine_file']
class_names_file = config['trt_engine']['trt_folder'] + os.path.sep + config['trt_engine']['class_names_file']
# Verify if all the necessary files are present, if so load the detection network
if os.path.exists(engine_file) and os.path.exists(class_names_file):
try:
logging.debug('In try statement')
self.trt_logger = trt.Logger()
f = open(engine_file, 'rb')
logging.debug('I can get here, but no further')
self.runtime = trt.Runtime(self.trt_logger)
logging.debug('Cannot get here')
self.engine = self.runtime.deserialize_cuda_engine(f.read())
# More stuff
I have found someone with a multithreading problem, but as of now I was unable to use this to solve my problem.
Any help is appreciated.
System specs:
Python 3.6.9
Jetson NX
Jetpack 4.4.1
L4T 32.4.4
Tensorrt 7.1.3.0-1
Cuda10.2
Ubuntu 18.04

same problem. It seems pycuda autoinit not working well under a multi process scenario.
try to replace import pycuda.autoinit with
cuda.init()
self.cuda_context = cuda.Device(0).make_context()

Related

How do I call out to a specific model version with Python and a TensorFlow serving model?

I have a few machine learning models running via TensorFlow Serving on Kubernetes. I'd like to be able to have one deployment of a particular model, and then load multiple versions.
This seems like it would be easier than having to maintain a separate Kubernetes deployment for each version of each model that we have.
But it's not obvious how to pass the version or model flavor I want to call using the Python gRPC interface to TF Serving. How do I specify the version and pass it in?

For whatever reason, it's not possible to update the model spec in place as you're building the request to pull. Instead, you need to separately build an instance of ModelSpec that includes the version you want, and then pass that in to the constructor for the prediction request.
Also worth pointing out you need to use the Google-specific Int64Value for the version.
from google.protobuf.wrappers_pb2 import Int64Value
from tensorflow_serving.apis.model_pb2 import ModelSpec
from tensorflow_serving.apis import predict_pb2, get_model_metadata_pb2, \
prediction_service_pb2_grpc
from tensorflow import make_tensor_proto
import grpc
model_name = 'mymodel'
input_name = 'model_input'
model_uri = 'mymodel.svc.cluster.local:8500'
X = # something that works
channel = grpc.insecure_channel(model_uri, options=MESSAGE_OPTIONS)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
version = Int64Value(value=1)
model_spec = ModelSpec(version=version, name=model_name, signature_name='serving_default')
request = predict_pb2.PredictRequest(model_spec=model_spec)
request.inputs[input_name].CopyFrom(make_tensor_proto(X.astype(np.float32), shape=X.shape))
result = stub.Predict(request, 1.0)
channel.close()

Running CrossValidationCV in parallel

When I run a GridsearchCV() and a RandomizedsearchCV() methods in parallel ( having n_jobs>1 or n_jobs=-1 options set )
it shows this message:
ImportError: [joblib] Attempting to do parallel computing without
protecting your import on a system that does not support forking. To
use parallel-computing in a script, you must protect your main loop
using "if name == 'main'". Please see the joblib documentation on
Parallel for more information" I put the code in a class in .py file
and call it using if_name_=='main in other .py file but it still shows
this message
It works good when n_jobs=1
import platform; print(platform.platform())
Windows-10-10.0.10586-SP0
import numpy; print("NumPy", numpy.__version__)
NumPy 1.13.1
import scipy; print("SciPy", scipy.__version__)
SciPy 0.19.1
import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.19.0
UPDATE
I tried this code but it still gives me the same error
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier
class Test():
def __init__(self):
attributes = [..]
dataset = pd.read_csv("..")
X=dataset[[..]]
Y=dataset[...]
model=DecisionTreeClassifier()
model = RandomizedSearchCV(....)
model.fit(X, Y)
if __name__ == '__main__':
Test()

joblib is know for this behaviour and rather explicit in documenting:
Warning
Under Windows, it is important to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel. In other words, you should be writing code like this:
import ....
def function1(...):
...
def function2(...):
...
...
if __name__ == '__main__':
# do stuff with imports and functions defined about
...
No code should run outside of the “if __name__ == ‘__main__’” blocks, only imports and definitions.
So, refactor your code so as to meet this well-defined requirement and your code will start to benefit from the joblib-tools powers.

I imagine this won't be the most useful answer, but you could always parallelize the process manually. https://docs.python.org/2/library/multiprocessing.html

TensorFlow: minimalist program fails on distributed mode

I wrote a very simple program that runs just fine without distribution but hangs on CheckpointSaverHook in distributed mode (everything on my localhost though!). I've seen there's been a few questions about hanging in distributed mode, but none seem to match my question.
Here's the script (made to toy with the new layers API):
import numpy as np
import tensorflow as tf
from tensorflow.contrib.learn.python.learn import learn_runner
from tensorflow.contrib import layers
DATA_SIZE=10
DIMENSION=5
FEATURES='features'
def generate_input_fn():
def _input_fn():
mid = int(DATA_SIZE/2)
data = np.array([np.ones(DIMENSION) if x < mid else -np.ones(DIMENSION) for x in range(DATA_SIZE)])
labels = ['0' if x < mid else '1' for x in range(DATA_SIZE)]
table = tf.contrib.lookup.string_to_index_table_from_tensor(tf.constant(['0', '1']))
label_tensor = table.lookup(tf.convert_to_tensor(labels, dtype=tf.string))
return dict(zip([FEATURES], [tf.convert_to_tensor(data, dtype=tf.float32)])), label_tensor
return _input_fn
def build_estimator(model_dir):
features = layers.real_valued_column(FEATURES, dimension=DIMENSION)
return tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,
dnn_feature_columns=[features],
dnn_hidden_units=[20,20])
def generate_exp_fun():
def _exp_fun(output_dir):
return tf.contrib.learn.Experiment(
build_estimator(output_dir),
train_input_fn=generate_input_fn(),
eval_input_fn=generate_input_fn(),
train_steps=100
)
return _exp_fun
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.DEBUG)
learn_runner.run(generate_exp_fun(), 'job_dir')
To test distributed mode, I simply launch it with the environment variable TF_CONFIG={"cluster": {"ps":["localhost:5040"], "worker":["localhost:5041"]}, "task":{"type":"worker","index":0}, "environment": "local"} (this is for the worker, the same with ps type is used to launch the parameter server.
I use tensorflow-1.0.1 (but had the same behavior with 1.0.0) on windows-64, only CPU. I actually never get any error, it just hang on after INFO:tensorflow:Create CheckpointSaverHook. forever... I've tried to attach VisualStudio C++ debugger to the process but with little success so far, so I can't print a stack for what's happening in the native part.
P.S.: it's not a problem with DNNLinearCombinedClassifier because it fails as well with a simple tf.contrib.learn.LinearClassifier. And as noted in the comments, it's not due to both process running on localhost, since it fails also when running on separate VMs.
EDIT: I think there's actually an issue with server launching. It looks like the server is not launched when you're in local mode (no matter if distributed or not), cf. tensorflow/contrib/learn/python/learn/experiment.py l.250-258:
# Start the server, if needed. It's important to start the server before
# we (optionally) sleep for the case where no device_filters are set.
# Otherwise, the servers will wait to connect to each other before starting
# to train. We might as well start as soon as we can.
config = self._estimator.config
if (config.environment != run_config.Environment.LOCAL and
config.environment != run_config.Environment.GOOGLE and
config.cluster_spec and config.master):
self._start_server()
This will prevent the server from being started in local mode for the workers... Anyone has an idea if it's a bug or there's something I'm missing?

So this has been answered in: https://github.com/tensorflow/tensorflow/issues/8796. Finally, one should use CLOUD for any distributed operation.

Is there a way to suppress the messages TensorFlow prints?

I think that those messages are really important for the first few times but then it is just useless.
It is actually making things worse to read and debug.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened
CUDA library libcublas.so.8.0 locally I
tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA
library libcudnn.so. LD_LIBRARY_PATH: I
tensorflow/stream_executor/cuda/cuda_dnn.cc:3459] Unable to load cuDNN
DSO I tensorflow/stream_executor/dso_loader.cc:128] successfully
opened CUDA library libcufft.so.8.0 locally I
tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA
library libcuda.so.1 locally I
tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA
library libcurand.so.8.0 locally
Is there a way to suppress the ones that just say it was successful?

UPDATE (beyond 1.14): see my more thorough answer here (this is a dupe question anyway): https://stackoverflow.com/a/38645250/6557588
In addition to Wintro's answer, you can also disable/suppress TensorFlow logs from the C side (i.e. the uglier ones starting with single characters: I, E, etc.); the issue open regarding logging has been updated to state that you can now control logging via an environmental variable. You can now change the level by setting the environmental variable called TF_CPP_MIN_LOG_LEVEL; it defaults to 0 (all logs shown), but can be set to 1 to filter out INFO logs, 2 to additionally filter out WARNING logs, and 3 to additionally filter out ERROR logs. It appears to be in master now, and will likely be a part of future version (i.e. versions after r0.11). See this page for more information. Here is an example of changing the verbosity using Python:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf
You can set this environmental variable in the environment that you run your script in. For example, with bash this can be in the file ~/.bashrc, /etc/environment, /etc/profile, or in the actual shell as:
TF_CPP_MIN_LOG_LEVEL=2 python my_tf_script.py

You can set the verbosity levels of TensorFlow's logging using
tf.logging.set_verbosity(tf.logging.ERROR)
where ERROR can be any of DEBUG, INFO, WARN, ERROR, or FATAL. See the logging module.
However, setting this to ERROR does not always completely block all INFO logs, to completely block them you have two main choices in my opinion.
If you are using Linux, you can just grep out all output strings beginning with I tensorflow/.
Otherwise, you can completely rebuild TensorFlow with some modified files. See this answer.
If you're using TensorFlow version 1 (1.X), you can use
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

As of Tensorflow v1.14 (yes, including version 2.x) you can use the native logging module to silence Tensorflow:
import logging
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # FATAL
logging.getLogger('tensorflow').setLevel(logging.FATAL)
I personally use this in my projects:
def set_tf_loglevel(level):
if level >= logging.FATAL:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
if level >= logging.ERROR:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
if level >= logging.WARNING:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
else:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'
logging.getLogger('tensorflow').setLevel(level)
so that I can disable tf logging by running:
set_tf_loglevel(logging.FATAL)
and I can re-enable with
set_tf_loglevel(logging.INFO)

I created a function which shuts TF up. I call it on start of my programs. Some messages are very annoying and I cannot do anything about them...
def tensorflow_shutup():
"""
Make Tensorflow less verbose
"""
try:
# noinspection PyPackageRequirements
import os
from tensorflow import logging
logging.set_verbosity(logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# Monkey patching deprecation utils to shut it up! Maybe good idea to disable this once after upgrade
# noinspection PyUnusedLocal
def deprecated(date, instructions, warn_once=True):
def deprecated_wrapper(func):
return func
return deprecated_wrapper
from tensorflow.python.util import deprecation
deprecation.deprecated = deprecated
except ImportError:
pass
Edit: This is for TF 2.0 and up:
def tensorflow_shutup():
"""
Make Tensorflow less verbose
"""
try:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# noinspection PyPackageRequirements
import tensorflow as tf
from tensorflow.python.util import deprecation
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
# Monkey patching deprecation utils to shut it up! Maybe good idea to disable this once after upgrade
# noinspection PyUnusedLocal
def deprecated(date, instructions, warn_once=True): # pylint: disable=unused-argument
def deprecated_wrapper(func):
return func
return deprecated_wrapper
deprecation.deprecated = deprecated
except ImportError:
pass

To anyone still struggling to get the os.environ solution to work as I was, check that this is placed before you import tensorflow in your script, just like craymichael's answer:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
import tensorflow as tf

This is what worked for me:
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
os.environ["KMP_AFFINITY"] = "noverbose"
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
tf.autograph.set_verbosity(3)

Considering previous answers, in Tensorflow 1.14 is actually possible to eliminate all the message generated by the library by including in the code the following two lines:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
This method exploits both the environment variable approach and Tensorflow logging module.
Note: the compatibility version is necessary to avoids further warnings given by the library, since the standard one is now deprecated.

For tensorflow 2.2 you can disable the logging with the following lines:
import tensorflow as tf
tf.get_logger().setLevel('ERROR')

This works for me without introducing any packages other than TensorFlow itself:
import tensorflow as tf
tf.autograph.set_verbosity(0) # "0" means no logging.
For more details, check this TensorFlow API documentation.

Python rpy2 and matplotlib conflict when using multiprocessing

I am trying to calculate and generate plots using multiprocessing. On Linux the code below runs correctly, however on the Mac (ML) it doesn't, giving the error below:
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
import rpy2.robjects as robjects
def main():
pool = multiprocessing.Pool()
num_figs = 2
# generate some random numbers
input = zip(np.random.randint(10,1000,num_figs),
range(num_figs))
pool.map(plot, input)
def plot(args):
num, i = args
fig = plt.figure()
data = np.random.randn(num).cumsum()
plt.plot(data)
main()
The Rpy2 is rpy2==2.3.1 and R is 2.13.2 (I could not install R 3.0 and rpy2 latest version on any mac without getting segmentation fault).
The error is:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
I have tried everything to understand what the problem is with no luck. My configuration is:
Danials-MacBook-Pro:~ danialt$ brew --config
HOMEBREW_VERSION: 0.9.4
ORIGIN: https://github.com/mxcl/homebrew
HEAD: 705b5e133d8334cae66710fac1c14ed8f8713d6b
HOMEBREW_PREFIX: /usr/local
HOMEBREW_CELLAR: /usr/local/Cellar
CPU: dual-core 64-bit penryn
OS X: 10.8.3-x86_64
Xcode: 4.6.2
CLT: 4.6.0.0.1.1365549073
GCC-4.2: build 5666
LLVM-GCC: build 2336
Clang: 4.2 build 425
X11: 2.7.4 => /opt/X11
System Ruby: 1.8.7-358
Perl: /usr/bin/perl
Python: /usr/local/bin/python => /usr/local/Cellar/python/2.7.4/Frameworks/Python.framework/Versions/2.7/bin/python2.7
Ruby: /usr/bin/ruby => /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
Any ideas?

This error occurs on Mac OS X when you perform a GUI operation outside the main thread, which is exactly what you are doing by shifting your plot function to the multiprocessing.Pool (I imagine that it will not work on Windows either for the same reason - since Windows has the same requirement). The only way that I can imagine it working is using the pool to generate the data, then have your main thread wait in a loop for the data that's returned (a queue is the way I usually handle it...).
Here is an example (recognizing that this may not do what you want - plot all the figures "simultaneously"? - plt.show() blocks so only one is drawn at a time and I note that you do not have it in your sample code - but without I don't see anything on my screen - however, if I take it out - there is no blocking and no error because all GUI functions are happening in the main thread):
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
import rpy2.robjects as robjects
data_queue = multiprocessing.Queue()
def main():
pool = multiprocessing.Pool()
num_figs = 10
# generate some random numbers
input = zip(np.random.randint(10,10000,num_figs), range(num_figs))
pool.map(worker, input)
figs_complete = 0
while figs_complete < num_figs:
data = data_queue.get()
plt.figure()
plt.plot(data)
plt.show()
figs_complete += 1
def worker(args):
num, i = args
data = np.random.randn(num).cumsum()
data_queue.put(data)
print('done ',i)
main()
Hope this helps.

I had a similar issue with my worker, which was loading some data, generating a plot, and saving it to a file. Note that this is slightly different than what the OP's case, which seems to be oriented around interactive plotting. Still, I think it's relevant.
A simplified version of my code:
def worker(id):
data = load_data(id)
plot_data_to_file(data) # Generates a plot and saves it to a file.
def plot_something_parallel(ids):
pool = multiprocessing.Pool()
pool.map(worker, ids)
plot_something_parallel(ids=[1,2,3])
This caused the same error others mention:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
Following #bbbruce's train of thought, I solved my problem by switching the matplotlib backend from TKAgg to the default. Specifically, I commented out the following line in my matplotlibrc file:
#backend : TkAgg

This might be rpy2-specific.
There are reports of a similar problem with OS X and multiprocessing here and there.
I think that using an initializer that imports the packages needed to run the code in plot could solve the problem (multiprocessing-doc).

I had a similar issue and found that setting the start method in multiprocessing to use forkserver works as long as it comes after your if name == main: statement.
if __name__ == '__main__':
multiprocessing.set_start_method('forkserver')
first_process = multiprocessing.Process(target = targetOne)
second_process = multiprocessing.Process(target = targetTwo)
first_process.start()
second_process.start()

Try to upgrade matplotlib to 3.0.3:
pip3 install matplotlib --upgrade
Then everything goes fine.
=======================================================================
No need to read below anymore.
Yesterday, my multiprocess plot works on my MacBook Air. But not working on my MacBook Pro tomorrow morning with the same code, displaying many:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
They are all using 4th gen i intel CPU (i5-4xxx with air and i7-4xxx with pro). So if there are no difference on hardware, it must be on software.
So I just tried update matplot to 3.0.3 on MacBook Pro( was 3.0.1), every thing goes fine.
Also, no need to do pool.apply_async anymore.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing with TensorRT - python

same problem. It seems pycuda autoinit not working well under a multi process scenario. try to replace import pycuda.autoinit with cuda.init() self.cuda_context = cuda.Device(0).make_context()

Related

How do I call out to a specific model version with Python and a TensorFlow serving model?

Running CrossValidationCV in parallel

TensorFlow: minimalist program fails on distributed mode

Is there a way to suppress the messages TensorFlow prints?

Python rpy2 and matplotlib conflict when using multiprocessing

Categories

Resources