Enable multi-threading on Caffe2

Enable multi-threading on Caffe2 - python

When compiling my program using Caffe2 I get this warnings:
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Since I do want to get multi-threading support for Caffe2, I've searched what to do. I've found that Caffe2 has to be re-compiled setting some arguments while creating the cmake or in the CMakeLists.
Since I already had installed pytorch in a conda env, I have first uninstalled Caffe2 with:
pip uninstall -y caffe2
Then I've followed the instructions from the Caffe2 docs, to build it from sources.
I first installed the dependencies as indicated. Then I downloaded pytorch inside my conda env with:
git clone https://github.com/pytorch/pytorch.git && cd pytorch
git submodule update --init --recursive
At this time I think is the moment to change the pytorch\caffe2\CMakeLists file just downloaded. I have read that in order to enable the multi-threading support is sufficient to enable the option USE_NATIVE_ARCH inside this CMakeLists, however I'm not able to find such option where I'm looking. Maybe I'm doing something wrong. Any thoughts? Thanks.
Here some details about my platform:
I'm on macOS Big Sur
My python version is 3.8.5
UPDATE:
To answer Nega this is what I've got:
python3 -c 'import torch; print(torch.__config__.parallel_info())'
ATen/Parallel:
at::get_num_threads() : 1
at::get_num_interop_threads() : 4
OpenMP not found
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
mkl_get_max_threads() : 4
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
std::thread::hardware_concurrency() : 8
Environment variables:
OMP_NUM_THREADS : [not set]
MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP
UPDATE 2:
It turned out that the Clang that comes with XCode doesn't support OpenMP. The gcc that I was using was just a symlink to Clang. In fact after running gcc --version I got:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
I installed from Homebrew gcc-10 and set the alias like this alias gcc='gcc-10'. In fact now with gcc --version this is what I get:
gcc-10 (Homebrew GCC 10.2.0_4) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I've also tried a simple Hello World for OpenMP using 8 threads and everything seems to be working. However after re-running the command:
python3 -c 'import torch; print(torch.__config__.parallel_info())'
I get the same outcome. Any thoughts?

AVX, AVX2, and FMA are CPU instruction sets and are not related to multi-threading. If the pip package for pytorch/caffe2 used these instructions on a CPU that didn't support them, the software wouldnt work. Pytorch, installed via pip comes with multi-threading enabled though. You can confirm this with torch.__config__.parallel_info()
❯ python3 -c 'import torch; print(torch.__config__.parallel_info())'
ATen/Parallel:
at::get_num_threads() : 6
at::get_num_interop_threads() : 6
OpenMP 201107 (a.k.a. OpenMP 3.1)
omp_get_max_threads() : 6
Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
mkl_get_max_threads() : 6
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
std::thread::hardware_concurrency() : 12
Environment variables:
OMP_NUM_THREADS : [not set]
MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP
That being said, if you still want to continue building pytorch and caffe2 from source, the flag your looking for, USE_NATIVE is in pytorch/CMakeLists.txt, one level up from caffe2. Edit that file and change USE_NATIVE to ON. Then continue building pytorch with python3 setup.py build. Note that the flag USE_NATIVE doesn't do what you think it does. It only allows the building of MKL-DNN with CPU native optimization flags. It does not trickle down to caffe2 (except where caffe2 use MKL-DNN obviously.)

Related

Conda and Linux

My OS is CentOs7.
I recently installed Conda and Mamba. Everything was working, but now it looks like that isn't the case. I cannot call Mamba. I can barely make Conda do anything. Initially, I installed Bakta, which works, but nothing else is and I continue to get messages about package incompatabilities. I thought maybe the environment I was in was potentially too updated for the new program, so I researched it and boom yes the program wants a few older versions (Python 3.6.5 for example) so I tried to download it and I got a lot of flack (aka everything is incompatable). This could make sense as I upgraded conda or updated --all, but I then tried to create a new environment and now conda won't even create an environment and it fails to resolve any issues. I tried going through PIP3 and it looks like numpy is not able to create a wheel, so I tried addressing that and it keeps just crashing out. Am I at this point to believe conda is broken? Here is the first error I am getting:
INFO:
########### CLIB COMPILER OPTIMIZATION ###########
INFO: Platform :
Architecture: x64
Compiler : unix-like
CPU baseline :
Requested : 'min'
Enabled : SSE SSE2 SSE3
Flags : -msse -msse2 -msse3
Extra checks: none
CPU dispatch :
Requested : 'max -xop -fma4'
Enabled : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2
Generated : none
INFO: CCompilerOpt.cache_flush[857] : write cache to path -> /tmp/pip-install-skejmo98/numpy_05946e1a2d574651a6019d72e428adca/build/temp.linux-x86_64-3.9/ccompiler_opt_cache_clib.py
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for numpy
Failed to build numpy
ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects

CUDA 11.6 not compatible with PyTorch 1.12.1

The PyTorch website says that PyTorch 1.12.1 is compatible with CUDA 11.6, but I get the following error:
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
I am using a laptop RTX 3060 and Poetry as my package manager in Python.
>>> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
>>> poetry show
certifi 2022.9.24 Python package for providing Mozilla's CA Bundle.
charset-normalizer 2.1.1 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
idna 3.4 Internationalized Domain Names in Applications (IDNA)
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
opencv-contrib-python 4.6.0.66 Wrapper package for OpenCV python bindings.
opencv-python 4.6.0.66 Wrapper package for OpenCV python bindings.
pillow 9.2.0 Python Imaging Library (Fork)
requests 2.28.1 Python HTTP for Humans.
torch 1.12.1 Tensors and Dynamic neural networks in Python with strong GPU acceleration
torchvision 0.13.1 image and video datasets and models for torch deep learning
typing-extensions 4.4.0 Backported and Experimental Type Hints for Python 3.7+
urllib3 1.26.12 HTTP library with thread-safe connection pooling, file post, and more.
What am I missing here? Is this a PyTorch <> CUDA issue or a CUDA <> GPU issue?

NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not
compatible with the current PyTorch installation. The current PyTorch
install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
The build of PyTorch which you have installed doesn't have binary support for your GPU. This is because whoever built the PyTorch you are using chose to build it like that. This isn't a question of CUDA versions or PyTorch versions. It just that many frameworks are built with a limited range of binary architectures in order to keep the size of the packages they distribute small.
NVIDIA provide a method to support forward compatible architectures running older code through JIT recompilation at runtime. Unfortunately the standard PyTorch build system doesn't use it in order to save space in their build distributions, so that cannot help you in this situation.
Your only solution is to either source another build with the appropriate binary support for your GPU included.

pycocotools (core dumped). Is it an AVX issue?

I try to use pycocotools from python:
$ ipython
Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 13:39:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from pycocotools.coco import COCO
Segmentation Error (core dumped)
Can this be related to my CPU missing AVX instructions (xeon E5520) ?

In case pycocotools has issues with the compiled part, please run
pip uninstall -y pycocotools
pip install --no-binary :all: pycocotools
This should compile the C code shipped with pycoctools on your machine.

If your program tried to execute AVX instructions on a CPU that doesn't support them, you'd get SIGILL (Illegal Instruction) on OSes like Linux.
"Segmentation Error" seems to be some kind of custom error message, because the normal string is always "Segmentation Fault". It's possible that there's a bug or stricter alignment requirement in the non-AVX version of code that's selected at runtime; runtime dispatch would be a plausible mechanism for code working on an AVX CPU and segfaulting without AVX.

How should I execute the Bazel Build Command in compiling Tensorflow?

I've been trying to install Tensorflow for a few weeks now and I keep getting a lot of errors with the simple installations so I think that it would be best for me to install Tensorflow from source. I'm following the instructions on the Tensorflow website exactly, and my ./configure is mostly all default so I can see if it works before I make modifications:
./configure
Please specify the location of python. [Default is /usr/bin/python]: /Library/Frameworks/Python.framework/Versions/3.6/bin/python3
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] n
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages]
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] n
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] n
No CUDA support will be enabled for TensorFlow
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
Configuration finished
(This is not the first time I've edited the configuration)
After this, I execute the following bazel build command straight from the Tensorflow.org website instructions for installing from source :
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
(In the future, I'm going to add some additional flags to account for the fact that I've been getting CPU instruction errors about SSE, AVX, etc.)
When I execute that bazel command, I get an extremely long wait time and a list of errors that piles up:
r08ErCk:tensorflow kendrick$ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
WARNING: /Users/kendrick/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:exporter': Use SavedModel Builder instead.
WARNING: /Users/kendrick/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:gc': Use SavedModel instead.
INFO: Found 1 target...
INFO: From Compiling external/protobuf/src/google/protobuf/compiler/js/embed.cc [for host]:
external/protobuf/src/google/protobuf/compiler/js/embed.cc:37:12: warning: unused variable 'output_file' [-Wunused-const-variable]
const char output_file[] = "well_known_types_embed.cc";
^
1 warning generated.
INFO: From Compiling external/protobuf/python/google/protobuf/pyext/message_factory.cc:
external/protobuf/python/google/protobuf/pyext/message_factory.cc:78:28: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
static char* kwlist[] = {"pool", 0};
^
external/protobuf/python/google/protobuf/pyext/message_factory.cc:222:6: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
{"pool", (getter)GetPool, NULL, "DescriptorPool"},
^
external/protobuf/python/google/protobuf/pyext/message_factory.cc:222:37: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
{"pool", (getter)GetPool, NULL, "DescriptorPool"},
^
3 warnings generated.
This is only a small portion of all the errors that looked similar to this that piled up. Even after all of the error messages, the command never returns and I just get the blinking cursor on an empty line.
Can someone please provide me with some exact instructions on what I should enter into terminal to avoid these errors? I've been following stack advice for weeks but continue to get errors.
MAC OS Sierra (MacBook Air)
What should I enter into terminal? (specifically)
Everything that I've done up to this point has been almost exactly what is told to do on the Tensorflow.org website instructions.

I installed for the first time using http://queirozf.com/entries/installing-cuda-tk-and-tensorflow-on-a-clean-ubuntu-16-04-install and not only was it a very simple process, but working with tf is really easy.. just source <name_of_virtual_environment>/bin/activate and then run python/python3 through that.
Bear in mind that the walkthrough in the link is for gpu tensorflow, however using the cpu tensorflow download for your mac instead, with with this virtual environment process should work just fine.

Since you do not have a GPU, do have SSE and AVX, and are on a mac sierra - the instructions found on google will NOT work with 1.3. i am befuddled on why they do not provide an exact script to do this. Regardless, here is the answer to your question http://www.josephmiguel.com/building-tensorflow-1-3-from-source-on-mac-osx-sierra-macbook-pro-i7-with-sse-and-avx/
/*
do each of these steps independently
will take around 1hr to complete all the steps regardless of machine type
*/
one time install
install anaconda3 pkg # manually download this and install the package
conda update conda
conda create -n dl python=3.6 anaconda
source activate dl
cd /
brew install bazel
pip install six numpy wheel
pip install –upgrade https://storage.googleapis.com/tensorflow/mac/cpu/protobuf-3.1.0-cp35-none-macosx_10_11_x86_64.whl
sudo -i
cd /
rm -rf tensorflow # if rerunning the script
cd /
git clone https://github.com/tensorflow/tensorflow
Step 1
cd /tensorflow
git checkout r1.3 -f
cd /
chmod -R 777 tensorflow
cd /tensorflow
./configure # accept all default settings
Step 2
// https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions
bazel build –config=opt –copt=-mavx –copt=-mavx2 –copt=-mfma //tensorflow/tools/pip_package:build_pip_package
Step 3
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-1.0.1-cp36-cp36m-macosx_10_7_x86_64.whl
Step 4
cd ~
ipython
Step 5
import tensorflow as tf
hello = tf.constant(‘Hello, TensorFlow!’)
sess = tf.Session()
print(sess.run(hello))
Step 6
pip uninstall /tmp/tensorflow_pkg/tensorflow-1.0.1-cp36-cp36m-macosx_10_7_x86_64.whl

Trying to install Theano but don't have an Nvidia card

I am following:
http://deeplearning.net/software/theano/install_windows.html#install-windows
to install theano. I just want to play with the code, I don't need to use a GPU to improve my speed.
I don't have an Nvidia card and when I try to install cuda, the installation fails. I watch as the installation tool deletes the files I need.
I am using Anaconda python so I commented this line:
REM CALL %SCISOFT%\WinPython-64bit-2.7.9.4\scripts\env.bat
In the:
C:\SciSoft\env.bat
file. I gave up and tried to install theano with easy_install.
I try to import Theano from python, it fails with:
ton of stuff
Problem occurred during compilation with the command line below: C:\SciSoft\TDM-GCC-64\bin\g++.exe -shared -g
-march=bdver2 -mmmx -mno-3dnow -mss
more stuff
C:\Users\xxx\Anaconda\libs/python27.lib: error adding symbols: File in wr ong format collect2.exe: error: ld returned
1 exit status
--------------------------------------------------------------------------- Exception Traceback (most recent call
last) in ()
----> 1 import theano
C:\Users\xxx\Anaconda\lib\site-packages\theano-0.7.0-py2.7.egg\theano__i
nit__.pyc in ()
Even more stuff
Exception: Compilation failed (return status=1): C:\Users\xxxx\Anaconda\li . collect2.exe: error: ld
returned 1 exit statusrong format

If you don't need to run on GPU then don't worry about installing Visual Studio or CUDA. I think you just need Anaconda, but maybe also TDM GCC.
From a clean environment I install the latest version of Anaconda then run
conda install mingw libpython
I'd recommend installing Theano from Github (the bleeding edge version) since the "stable" release is not updated often and there are usually many significant improvements (especially for performance) in the bleeding edge version in comparison to the stable version.
There is no need to perform all the steps in the "Configuring the Environment" section; just make sure your C++ compiler is in the PATH.
If Theano fails to work after these minimal installation instructions, I'd recommend solving each problem on a case-by-case basis instead of trying to run the full installation instructions provided in the documentation (which may be out of date).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.