Trouble Building Python OpenCV with Cuda on Windows 10

Trouble Building Python OpenCV with Cuda on Windows 10 - python

I'm trying to build OpenCV 3.4.15 with Cuda support for Python 3.9.5 on Windows 10 and am getting stuck. I've followed a number of tutorials and I think I'm close, but I seem to be missing something.
I've run CMake with WITH_CUDA enabled, and everything seems to be working great. I open OpenCV.sln and build ALL_BUILD and and then build INSTALL. Everything builds successfully, and the Cuda support on the C++ side works great. On the Python side, I end up with files added to the following path, and non-Cuda OpenCV in Python seems to work fine.
C:\Users\Name\AppData\Local\Programs\Python\Python39\Lib\site-packages\cv2
If I run cv2.getBuildInformation() I see the following, which makes me think I was successful (this didn't show up with the pip install version of OpenCV).
NVIDIA CUDA: YES (ver 11.4, CUFFT CUBLAS)
NVIDIA GPU arch: 35 37 50 52 60 61 70 75 80 86
NVIDIA PTX archs:
However, if I run cv2.cuda.getCudaEnabledDeviceCount() I get the following.
AttributeError: module 'cv2.cuda' has no attribute 'getCudaEnabledDeviceCount'
When I run the same command with OpenCV installed through pip, this successfully outputs 0.
One thing I noticed is that the site-packages/cv2 directory I created from source is only about 6MB, while the site-packages/cv2 directory I get from pip install is nearly 100MB, so something seems off.

It seems that building OpenCV 4.5.4 instead of OpenCV 3.4.15 resolved the issue. The Python bindings to the OpenCV Cuda implementations are now available.

Related

Tensorflow 2.4.1 - Couldn't invoke ptxas.exe

I try to run Tensorflow with GPU support (GTX 1660 SUPER).
I created an enviroment using anaconda, than installed cudatoolkit (version 11.0.221) and tensorflow-gpu (version 2.4.1). Afterwards, I downloaded cuDNN (version 8.0.4), and copied all files from cuDNN's bin folder to my environment's bin folder at anaconda3\envs\<env name>\Library\bin.
In my script, I've set the memory limit to my GPU's memory using tf.config.experimental.set_memory_growth.
When I run the script (which uses convolutional algorithms), I get a warning that says Couldn't invoke ptxas.exe --version which comes after an Call to CreateProcess failed. Error code: 2 error.
After the launch failure, I get: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location.
I've already tried switching to cuDNN version 8.1.1.
How I fix this?

I got a new fix for this.
First I tried using tensorflow=2.3, cudnn=7.6.5 and cudatoolkit=10.1 as mentioned in previous answers. However, every time I put a model to train, the process was going stale and the training seemed to be stuck in epoch 1.
I then managed to include ptxas in my conda environment by running conda install -c nvidia cuda-nvcc The packages I am using are:
tensorflow=2.9, cudnn=8.1.0, cudatoolkit=11.2.2, cuda-nvcc=11.7.99 and python=3.9
I am running everything on windows 10 flawlessly now.

For the benefit of community adding #Zuk Levinson comment
Solves the issue by using
tensorflow=2.3, cudnn=7.6.5 and cudatoolkit=10.1

Use GPU with opencv-python

I'm trying to use opencv-python with GPU on windows 10.
I installed opencv-contrib-python using pip and it's v4.4.0.42, I also have Cuda on my computer and in path.
Anyway, here is a (simple) code that I'm trying to compile:
import cvlib as cv
from cvlib.object_detection import draw_bbox
bbox, label, conf = cv.detect_common_objects(img,confidence=0.5,model='yolov3-worker',enable_gpu=True)
output_image = draw_bbox(img, bbox, label, conf)
First, here is the line that tell me that tf is ok with cuda:
2020-08-26 5:51:55.718555: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
but when I try to use my GPU to analyse the image, here is what happen:
[ WARN:0] global C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-j8nxabm_\opencv\modules\dnn\src\dnn.cpp (1429) cv::dnn::dnn4_v20200609::Net::Impl::setUpNet DNN module was not built with CUDA backend; switching to CPU
Is there a way to solve this without install opencv using cmake? It's a mess on windows...

The problem here is that version of opencv distributed with your system (Windows in this case) was not compiled with Cuda support. Therefore, you cannot use any cuda related function with this build.
If you want to have an opencv with cuda support, you will have to either compile it yourself (which may be tedious on windows) or find a prebuilt one somewhere. In case you want to go for the 1st solution, here is a link that may help you with the process: https://programming.vip/docs/compile-opencv-with-cuda-support-on-windows-10.html. Keep in mind that this will require you to install a bunch of SDK in the process.

Things seem to have changed a little since this question was asked initially:
From https://github.com/opencv/opencv-python
Option 1 - Main modules package: pip install opencv-python
Option 2 - Full package (contains both main modules and contrib/extra modules): pip install opencv-contrib-python (check contrib/extra modules listing from OpenCV documentation) ==> https://docs.opencv.org/master/
Sadly, not all of the modules listed above seem to be available in the "Full package" eg. cudafilters. If anyone knows any better, I for one would be very grateful to learn more.

For those who can get the same issue. As Harry mentionned, it's not possible to use GPU with opencv from pip, you have to "manually" build it using Cmake (for windows).
It's a bit tricky but there are many tutorials which are here to help you.
I spent two days trying to make cvlib works and that's why: one of the cudnn.dll curently available from Nvidia website is named:
Cudnn64_8.dll
and opencv (or tensorflow to be more precise) needs
Cudnn64_7.dll
in fact you just have to replace the 8 by the 7 ! ;)
That was the only hard part and I believed it came from the cmake process.
Thanks again Harry.

EXE made from Python file which uses Tensorflow-GPU does not use GPU when deployed

I have a python file which uses tensorflow GPU in it. It uses GPU when i run the file from console using python MyFile.py.
However, when i convert it into exe using pyinstaller, it converts and runs successfully, But it does not use GPU anymore when i run the exe. This happens on a system which was not used for developing MyFile.py. Checking on the same system which was used in development, it uses just 40-50% GPU, which was 90% if i run the python script.
My application even has a small UI made using tkinter.
Though application runs fine on CPU, It is incredibly slow. (I am not using --one-file flag in pyinstaller.) Although having GPU, The application is not using it.
My questions are:
How do I overcome this issue? Do I need to install any CUDA or CuDnn toolkits in my Destination computer?
(Once the main question is solved) Can i use 1050ti in development and 2080ti in destination computer, if the CuDnn and CUDA versions are the same?
Tensorflow Version : 1.14.0 (I know 2.x is out there, but this works perfectly fine for me.)
GPU : GeForce GTX 1050 ti ( In development as well as deployment.)
CUDA Toolkit : 10.0
CuDnn : v7.6.2 for cuda 10.0
pyinstaller version : 3.5
Python version : 3.6.5

As I asnwered also here, according to the GitHub issues in the official repository (here and here for example) CUDA libraries are usually dynamically loaded at run-time and not at link-time, so they are typically not included in the final exe file (or folder) with the result that the generated exe file won't work on a machine without CUDA installed. The solution (please refer to the linked issues too) is to put the DLLs necessary to run the exe in its dist folder (if generated without the --onefile option) or install the CUDA runtime on the target machine.

Tensorflow Compile Runs For A Long Time

So I am trying to compile TensorFlow from the source (using a clone from their git repo from 2019-01-31). I installed Bazel from their shell script (https://github.com/bazelbuild/bazel/releases/download/0.21.0/bazel-0.21.0-installer-linux-x86_64.sh).
I executed ./configure in the tensorflow code and provided the default settings except for adding my machine specific -m options (-mavx2 -mfma) and pointing python to the correct python3 location (/usr/bin/py3). I then ran the following command as per the tensorflow instructions:
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package //tensorflow:libtensorflow_framework.so //tensorflow:libtensorflow.so
Now that continues to run and run, I haven't seen it complete yet (though I am limited to letting it run for a maximum of about 10 hours). It produces a ton of INFO: warnings regarding signed and unsigned integers and control reaching the end of non-void functions. None of these appear fatal. Compilation continues to tick with the two numbers continuing to grow ('[N,NNN / X,XXX] 4 actions running') and files ticking through 'Compiling'.
The machine is an EC2 instance with ~16GiB of RAM, CPU is 'Intel(R) Xeon(R) CPU E5-2686 v4 # 2.30GHz' with I believe 4-cores, plenty of HDD space (although compilation seems to eat QUITE a bit, > 1GiB)
Any ideas on what's going on here?

Unfortunately, some programs can take a long time to compile. A couple of hours of compilation is not strange for tensorflow on your setup.
There are reports of it taking 50 minutes on a considerably faster machine
A solution to this problem is to use pre-compiled binaries that are available with pip, instructions can be found here: https://www.tensorflow.org/install/pip.html
Basically you can do this:
pip install tensorflow
If you require a specific older version, like 1.15, you can do this:
pip install tensorflow==1.15
For gpu support you add -gpu to the package name, like this:
pip install tensorflow-gpu
And:
pip install tensorflow-gpu==1.15

Python pyopencl DLL load failed even with latest drivers

I've installed the latest CUDA and driver for my GPU. I'm using Python 2.7.10 on Win7 64bit.
I tried installing pyopencl from:
a. the unofficial windows binaries at http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyopencl
b. by compiling my own after getting the sources from https://pypi.python.org/pypi/pyopencl
The installation was successful on both cases but I get the same error message once I try to import it:
>>> import pyopencl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pyopencl-2015.1-py2.7-win-amd64.egg\pyope
cl\__init__.py", line 30, in <module>
import pyopencl._cl as _cl
ImportError: DLL load failed: The specified procedure could not be found.
>>>
I have Visual C++ Redistributable for Visual Studio 2015 installed from https://www.microsoft.com/en-us/download/details.aspx?id=48145 .
I also tried with 2 different versions of the GPU driver (including latest). Same thing.
A lot of people seem to get the same error and on some forums I read that by updating the GPU drivers to latest, it works fine. But not for me.
Anyone knows how to fix this?

I'm affraid there isn't one right answer to this problem. Each case is different. It depends on what is installed in the OS.
To track down such problems I normally use Dependency Walker.
In this specific case I would open _cl.pyd (usually in C:\Python27\Lib\site-packages\pyopencl) in Dependency Walker to check if there aren't any missing dependencies or if for example OpenCL.dll is actually the one which should be used. OpenCL.dll may be installed by other programs and their path added to PATH. Also OpenCL.dll in System32 may be too old. Basically trial and error renaming all but one OpenCL.dll into OpenCL.dll.bak and/or removing paths from PATH may get you there.

I had this same problem and discovered it was caused by AMD OpenCL.dll not having a function introduced in OpenCL 2.1. The Gohlke site only has OpenCL 2.1 and 1.2, while AMD drivers support 2.0.
Because I wanted 2.0, the easy fix was to manually replace the AMD System32/OpenCL.dll with the one from Intel SDK with experimental 2.1 support.

I had the same problem here, the way I resolved it was:
Make sure you have downloaded and installed the right OpenCL SDK. For example
Intel
NVIDIA
Open the Windows Command Prompt cmd and set the LIB and INCLUDE environment variables. For example
Intel:
set INCLUDE=C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk\include
set LIB=C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk\lib\x64
NVIDIA:
set LIB=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\x64
set INCLUDE=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include
now run pip install pyopencl --no-cache-dir
open Python and test import pyopencl
there might be a way to install PyOpenCL via pipwin or by using the --global-option to set the include and library folders. But I haven't succeeded so far.
P.S. The above mentioned NVIDIA OpenCL SDK (i.e., CUDA toolkit) turns out to be very outdated. please don't use it. If you have that installed, please uninstall and install the newer versions.

Try both the versions 1.2 and 2.1 I was trying with later and got this issue. Switched the whl and it works but used the Intel GPU. NVidia OpenCL.dll is 2.0 and that is not working still.
Just checked the cl.get_platforms array and found them
0. Intel
1. NVidia
pyopencl.Platform Intel(R) OpenCL & pyopencl.Device Intel(R) Core(TM) ... Intel(R) OpenCL
pyopencl.Platform NVIDIA CUDA & pyopencl.Device Quadro ... NVIDIA CUDA

I had the same problem in my Lenovo yoga 720. It has NVidia Geforce GTX1050 and intel i7 630 CPU/GPU.
I installed a long time ago update drivers and SDK for Nvidia CUDA. But now I what to run python OpenGL and I install intel SDK also. Pip install pyopencl without problems but import pyopengl give me dll load failure.
Solution was to change Windows\system32\opencl.dll to a new one. The old one was NVidia signed (you can see it in properties of file opencl.dll). The new one is Microsoft signed version 2.1.1.0 Khronos OpenCL ICD
I hope this is useful for you. Solution arrived after a long time trying a lot of things... but nothing worked except the new opencl.dll file

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trouble Building Python OpenCV with Cuda on Windows 10 - python

It seems that building OpenCV 4.5.4 instead of OpenCV 3.4.15 resolved the issue. The Python bindings to the OpenCV Cuda implementations are now available.

Related

Tensorflow 2.4.1 - Couldn't invoke ptxas.exe

Use GPU with opencv-python

EXE made from Python file which uses Tensorflow-GPU does not use GPU when deployed

Tensorflow Compile Runs For A Long Time

Python pyopencl DLL load failed even with latest drivers

Categories

Resources