How can I make use of intel-mkl with tensorflow

How can I make use of intel-mkl with tensorflow - python

I've seen a lot of documentation about making using of a CPU with tensorflow, however, I don't have a GPU. What I do have is a fairly capable CPU and a holing 5GB of intel math kernel, which, I hope, might help me speed up tensorflow a fair bit.
Does anyone know how I can "make" tensorflow use the intel-mlk ?

Build TensorFlow 1.2 from source and during the configuration step enable the support of MKL.
Note for Mac users
As of Dec. 2017, MKL only works on Linux. See https://tensorflow.org/performance/performance_guide#optimizing_for_‌cpu
Note: MKL was added as of TensorFlow 1.2 and currently only works on Linux. It also does not work when also using --config=cuda.

Since tensorflow uses Eigen, try to use an MKL enabled version of Eigen as described here:
define the EIGEN_USE_MKL_ALL macro before including any Eigen's header
link your program to MKL libraries (see the MKL linking advisor)
on a 64bits system, you must use the LP64 interface (not the ILP64 one)
So one way to do it is to follow the above steps to modify the source of tensorflow, recompile and install on your machine. While you're at it you should also try the Intel compiler, which might provide a decent performance boost by itself, if you set the correct flags: -O3 -xHost -ipo.

I know its been a whole year but I now see that there is an office WHEEL for Intel Optimized Tensorflow. Worth a try
https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available

Related

Use GPU with opencv-python

I'm trying to use opencv-python with GPU on windows 10.
I installed opencv-contrib-python using pip and it's v4.4.0.42, I also have Cuda on my computer and in path.
Anyway, here is a (simple) code that I'm trying to compile:
import cvlib as cv
from cvlib.object_detection import draw_bbox
bbox, label, conf = cv.detect_common_objects(img,confidence=0.5,model='yolov3-worker',enable_gpu=True)
output_image = draw_bbox(img, bbox, label, conf)
First, here is the line that tell me that tf is ok with cuda:
2020-08-26 5:51:55.718555: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
but when I try to use my GPU to analyse the image, here is what happen:
[ WARN:0] global C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-j8nxabm_\opencv\modules\dnn\src\dnn.cpp (1429) cv::dnn::dnn4_v20200609::Net::Impl::setUpNet DNN module was not built with CUDA backend; switching to CPU
Is there a way to solve this without install opencv using cmake? It's a mess on windows...

The problem here is that version of opencv distributed with your system (Windows in this case) was not compiled with Cuda support. Therefore, you cannot use any cuda related function with this build.
If you want to have an opencv with cuda support, you will have to either compile it yourself (which may be tedious on windows) or find a prebuilt one somewhere. In case you want to go for the 1st solution, here is a link that may help you with the process: https://programming.vip/docs/compile-opencv-with-cuda-support-on-windows-10.html. Keep in mind that this will require you to install a bunch of SDK in the process.

Things seem to have changed a little since this question was asked initially:
From https://github.com/opencv/opencv-python
Option 1 - Main modules package: pip install opencv-python
Option 2 - Full package (contains both main modules and contrib/extra modules): pip install opencv-contrib-python (check contrib/extra modules listing from OpenCV documentation) ==> https://docs.opencv.org/master/
Sadly, not all of the modules listed above seem to be available in the "Full package" eg. cudafilters. If anyone knows any better, I for one would be very grateful to learn more.

For those who can get the same issue. As Harry mentionned, it's not possible to use GPU with opencv from pip, you have to "manually" build it using Cmake (for windows).
It's a bit tricky but there are many tutorials which are here to help you.
I spent two days trying to make cvlib works and that's why: one of the cudnn.dll curently available from Nvidia website is named:
Cudnn64_8.dll
and opencv (or tensorflow to be more precise) needs
Cudnn64_7.dll
in fact you just have to replace the 8 by the 7 ! ;)
That was the only hard part and I believed it came from the cmake process.
Thanks again Harry.

Confused with setting up ML and DL on GPU

My goal is to set up my PC for machine and deep learning through my GPU. I've read about all the different components however I can not connect the dots for what I need to do.
OS: Ubuntu 20.04
GPU: Nvidia RTX 2070 Super
Anaconda: 4.8.3
I've installed the nvidia-cuda-toolkit (10.1.243), but now what?
How does this integrate with jupyter notebook?
The 3 python modules I want to work with are:
turicreate - I've gotten this to run off CPU but not GPU
scikit-learn
tensorflow
matlab
I know cuDNN and pyCUDA fit in there somewhere.
Any help is appreciated. Thanks

First of all - I have the experience limited to ubuntu 18.04 and 16.xx and python DL frameworks. But I hope some sugestions will be helpfull.
If I were familiar with docker I would rather consider to use docker instead of setting-up everything from scratch. This approach is described in section about tensorflow container
If you decided to setup all components yourself please see this guideline
I used some contents from it for 18.04, succesfully.
be carefull with automatic updates. After the configuration is finished and tested protect it from being overwritten with newest version of CUDAor TensorRT.
Answering one of your sub-questions - How does this integrate with jupyter notebook? - it does not, becuase it is unneccesary. CUDA library cooperates with a framework such as Tensorflow, not with the Jupyter. Jupyter is just an editor and execution controller on the server side.

MXNet ML lib C++ segmentation fault on OS X

I have a problem with Apache MXNet machine learning library on OS X.
I have been able to run Python version of Lenet, convolutional neural network.
I installed these with pip under both Anaconda Python 2.7 and 3.6.
conda create -n mxnet27 python=2.7
conda info --envs
source activate mxnet27
conda list
pip install mxnet==0.12.1
But when I run C++ example files cpp-package/example/lenet.cpp I get the this segfault:
Segmentation fault: 11
This is the place in the code where the segfault is thrown:
Symbol conv1 =
Convolution("conv1", data, conv1_w, conv1_b, Shape(5, 5), 20);
I get similar segfault for the other C++ examples.
I have built MXNet on OS X 10.13.2
I disabled as many libraries as possible, e.g. OpenCV and CUDA.
On Simon Corston-Oliver suggestion I upgraded to MXNet 1.0.0, but that version did not compile with Clang on OS X. Error message:
operator_tune.h:150:36: note: add an explicit instantiation declaration to suppress this
warning if 'mxnet::op::OperatorTuneByType<float>::tuning_mode_' is explicitly instantiated in another translation unit
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/unordered_map:601:15: error: object of type 'std::__1::pair<int,
mxnet::test::perf::TimingInstrument::Info>' cannot be assigned because its copy assignment operator is implicitly deleted

I don't know of a specific issue with v0.12 that would lead to a segfault but before we dig in, I'd recommend upgrading to v1.0 which was released 2017-12-04.
If you still encounter the same problem with 1.0 we can work to debug.

I found a solution to compiling MXNet 1.0.0 posted here by helloniklas:
https://github.com/apache/incubator-mxnet/issues/9217
It involved only using make instead of CMake.
This solution worked me and compiled the code.
C++ examples runs without the seg fault, but documentation is scarce. I only got one of the to do training.

What does the error: `Loaded runtime CuDNN library: 5005 but source was compiled with 5103` mean?

I was trying to use TensorFlow with GPU and got the following error:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
of course I am trying to fix this error (though this has already been asked Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100)) but I'd like to understand the error. I always try to attempt solving (coding) problems myself before posting (asking for help) but I am having a hard time even starting this one because the error message seems a little cryptic/unclear to me and I can't seem to find a good resource to understand what the error means.
To understand the error I focused on the line that seems to be where the error starts:
Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100).
After reading some github pages that seemed relevant I realized that reading the error as follows is actually more helpful:
Loaded runtime CuDNN library: 5005 but source was compiled with 5103.
removing the parenthesis makes the error make a bit more sense (though I'd like to understand/know what the role of the parenthesis is in the error message to easy the debugging) since it seems that it loaded CuDNN library 5005 (at the level of UNIX/OS) but the TensorFlow (for python) was compiled with what I would guess is version 5103. Obviously if the TensorFlow library is using an API according to 5103 but the "real" API to talk to the (cuda) deep learning library CuDNN is version 5005, its clear it would be a problem. Though they are just guesses of whats going on.
My first confusion is that as far as I can tell, there is no such thing CuDNN 5005 or 5103. It would be awesome to understand what that part of the error means for sure so that I can start trying to debug this for real. As far as I can tell when I use module list I am using:
cudnn/5.0
My second confusion is the parenthesis that I ignored and what they mean:
Loaded runtime CuDNN library: 5005 (compatibility version 5000)
but source was compiled with 5103 (compatibility version 5100)
I honestly have no idea idea what the "compatibility version XXXX" means. Maybe its suggestion to install version 5000 (whatever that means) for CuDNN (which is still confusing because there isn't a 5 thousand version of CuDNN) and compile a version of TensorFlow (somehow) that uses CuDNN version 5100.
Does someone know more precisely what the errors mean exactly (and make provide their solution to the question I linked?)

This is an approximate description of what is going on.
cuDNN has major releases that are numbered e.g. 4.0, 5.0, 5.1, etc.
These major releases may incorporate API changes. Therefore a program that uses cuDNN v4 (i.e. 4.0) may need some modifications to work with or use new features in cuDNN v5 (i.e. 5.0).
The major release is encoded in the first two digits of the 4-digit version number. So a cuDNN 4-digit version number of 5103 means it belongs to the 5.1 major release and has a sub-version number of 03. For compatibility purposes, such a release should be API-compatible with any other cuDNN library version of 51xx because they all belong to the 5.1 major release (this is not guaranteed to be strictly true AFAIK, but it is the general idea). Therefore any of these libraries with release numbering 51xx would have a compatibility version of 5100, to indicate that they belong to (and are (should be) compatible with) the 5.1 major release.
So when we are referring to a compatibility version (what major release is this library compatible with) we only need to specify the first two digits - 5000 indicates 5.0, 5100 indicates 5.1. But it is possible for a release to have a sub-release version number that is non-zero. There could be a variety of reasons for this, for example to allow for bug-fix releases and the like.
When a program (like tensorflow) is designed to use cuDNN, it will generally be coded to work with a particular version of cuDNN. In some cases, this can be handled at compile time, by "compiling against" a pariticular cuDNN version (and it's associated API, i.e. header files used when building tensorflow). Therefore, at compile time, a program like tensorflow can determine what version of the cuDNN API it was compiled against, and that is a 4-digit version (although generally speaking, only the compatibililty version i.e. the first two digits of the 4-digit version should really matter).
At runtime, you have a particular version of the cuDNN library (e.g. .so on linux) loaded on your machine somewhere. The version of that library can be determined, queried, and reported. If that actual library version does not match (at least from a compatibility version perspective) the version of the cuDNN library that tensorflow was compiled against, then that's a good indication that things may not work, and so tensorflow points this out when it is running:
Loaded runtime CuDNN library: 5005 but source was compiled with 5103.
This is tensorflow telling you "hey, I was designed (compiled) to work with cuDNN v5.1 but you are only giving me cuDNN 5.0 to work with".
Differences at the sub-version level should be less significant. If you know what you are doing, it may be ok to use cuDNN runtime version 5107 even if your tensorflow was compiled against version 5103. This is just a hypothetical example, but that would indicate that there is some difference in the library which was not intended to change proper functionality or behavior, or the API interface. It could be just a bug-fixed version of 5103, for example (hypothetically. This is an imaginary example.)
In the ideal case, you would build tensorflow against the version of cuDNN that you are using. If you have downloaded pre-built tensorflow packages, however, then you may witness this sort of message (since you presumably downloaded cuDNN separately). In that case, you should at least seek to match the cuDNN major version you are using against the compatibility version that tensorflow is expecting. In this particular example, you are not doing that.

Maybe you can download "cuDNN v5.1 for CUDA 8.0/7.5,and then install it.

Python pyopencl DLL load failed even with latest drivers

I've installed the latest CUDA and driver for my GPU. I'm using Python 2.7.10 on Win7 64bit.
I tried installing pyopencl from:
a. the unofficial windows binaries at http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyopencl
b. by compiling my own after getting the sources from https://pypi.python.org/pypi/pyopencl
The installation was successful on both cases but I get the same error message once I try to import it:
>>> import pyopencl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pyopencl-2015.1-py2.7-win-amd64.egg\pyope
cl\__init__.py", line 30, in <module>
import pyopencl._cl as _cl
ImportError: DLL load failed: The specified procedure could not be found.
>>>
I have Visual C++ Redistributable for Visual Studio 2015 installed from https://www.microsoft.com/en-us/download/details.aspx?id=48145 .
I also tried with 2 different versions of the GPU driver (including latest). Same thing.
A lot of people seem to get the same error and on some forums I read that by updating the GPU drivers to latest, it works fine. But not for me.
Anyone knows how to fix this?

I'm affraid there isn't one right answer to this problem. Each case is different. It depends on what is installed in the OS.
To track down such problems I normally use Dependency Walker.
In this specific case I would open _cl.pyd (usually in C:\Python27\Lib\site-packages\pyopencl) in Dependency Walker to check if there aren't any missing dependencies or if for example OpenCL.dll is actually the one which should be used. OpenCL.dll may be installed by other programs and their path added to PATH. Also OpenCL.dll in System32 may be too old. Basically trial and error renaming all but one OpenCL.dll into OpenCL.dll.bak and/or removing paths from PATH may get you there.

I had this same problem and discovered it was caused by AMD OpenCL.dll not having a function introduced in OpenCL 2.1. The Gohlke site only has OpenCL 2.1 and 1.2, while AMD drivers support 2.0.
Because I wanted 2.0, the easy fix was to manually replace the AMD System32/OpenCL.dll with the one from Intel SDK with experimental 2.1 support.

I had the same problem here, the way I resolved it was:
Make sure you have downloaded and installed the right OpenCL SDK. For example
Intel
NVIDIA
Open the Windows Command Prompt cmd and set the LIB and INCLUDE environment variables. For example
Intel:
set INCLUDE=C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk\include
set LIB=C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk\lib\x64
NVIDIA:
set LIB=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\x64
set INCLUDE=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include
now run pip install pyopencl --no-cache-dir
open Python and test import pyopencl
there might be a way to install PyOpenCL via pipwin or by using the --global-option to set the include and library folders. But I haven't succeeded so far.
P.S. The above mentioned NVIDIA OpenCL SDK (i.e., CUDA toolkit) turns out to be very outdated. please don't use it. If you have that installed, please uninstall and install the newer versions.

Try both the versions 1.2 and 2.1 I was trying with later and got this issue. Switched the whl and it works but used the Intel GPU. NVidia OpenCL.dll is 2.0 and that is not working still.
Just checked the cl.get_platforms array and found them
0. Intel
1. NVidia
pyopencl.Platform Intel(R) OpenCL & pyopencl.Device Intel(R) Core(TM) ... Intel(R) OpenCL
pyopencl.Platform NVIDIA CUDA & pyopencl.Device Quadro ... NVIDIA CUDA

I had the same problem in my Lenovo yoga 720. It has NVidia Geforce GTX1050 and intel i7 630 CPU/GPU.
I installed a long time ago update drivers and SDK for Nvidia CUDA. But now I what to run python OpenGL and I install intel SDK also. Pip install pyopencl without problems but import pyopengl give me dll load failure.
Solution was to change Windows\system32\opencl.dll to a new one. The old one was NVidia signed (you can see it in properties of file opencl.dll). The new one is Microsoft signed version 2.1.1.0 Khronos OpenCL ICD
I hope this is useful for you. Solution arrived after a long time trying a lot of things... but nothing worked except the new opencl.dll file

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.