Tensorflow can't find libcuda.so (CUDA 7.5) - python

I've installed CUDA 7.5 toolkit, and Tensorflow inside anaconda env. The CUDA driver is also installed. The folder containing the so libraries is in LD_LIBRARY_PATH. When I import tensorflow I get the following error:
Couldn't open CUDA library libcuda.so. LD_LIBRARY_PATH:
/usr/local/cuda-7.5/lib64
In this folder, there exist a file named libcudart.so (which is actually a symbolic link to libcudart.so.7.5). So (just as a guess) I created a symbolic link to libcudart.so named libcuda.so. Now the library is found by Tensorflow, but as soon as I call tensorflow.Session() I get the following error:
F tensorflow/stream_executor/cuda/cuda_driver.cc:107] Check failed: f
!= nullptr could not find cuInitin libcuda DSO; dlerror:
/usr/local/cuda-7.5/lib64/libcudart.so.7.5: undefined symbol: cuInit
Any ideas?

For future reference, here is what I found out and what I did to solve this problem.
The system is Ubuntu 14.04 64 bit. The NVIDIA driver version that I was trying to install was 367.35. The installation resulted in an error towards the end, with message:
ERROR: Unable to load the kernel module 'nvidia-drm'
However the CUDA samples compiled and run with no problem, so the driver was at least partially installed correctly. However, when I checked the version using:
cat /proc/driver/nvidia/version
The version I got was different (I don't remember exactly but some 352 sub-version).
So I figured out I better remove all traces of the driver and re-install. I followed the instructions in the accepted answer here: https://askubuntu.com/questions/206283/how-can-i-uninstall-a-nvidia-driver-completely, except for the command that makes sure nouveau driver will be loaded in boot.
I finally reinstalled the most up-to-date NVIDIA driver (367.35). The installation finished with no errors and Tensorflow was able to load all libraries.
I think the problem began when someone who worked on the installation before me used apt-get to install the driver, and not a run script. Not sure however.
PS during installation there is a warning:
The distribution-provided pre-install script failed! Are you sure
you want to continue?
Looking at the logs I could locate this pre-install script, and its content is simply:
# Trigger an error exit status to prevent the installer from overwriting
# Ubuntu's nvidia packages.
exit 1
so it seems ok to install despite this warning.

I had this error on a couple of Ubuntu 16.04 machines. I tried just updating the NVIDIA drivers and Cuda toolkit hoping that apt would take care of replacing the missing file, but that didn't happen.
Here's a hopefully clear explanation of how I fixed an error like:
...libcuda.so.1: cannot open shared object file: No such file or directory
You are missing this libcuda.so.1 file apparently.
If you look at other SO posts, you will discover that libcuda.so.1 is actually a symbolic link (fancy Unix term for a thing that looks like a file but actually is just a pointer to another file). Specifically, it is a symbolic link to a libcuda.so.# file that is part of the NVIDIA graphics drivers!!! (not part of the Cuda toolkit). So if you do find wherever the package manager has put the libcuda.so.1 file on your system, you'll see it's pointing to this driver-related file:
$ ls /usr/lib/x86_64-linux-gnu/libcuda.so.1 -la
lrwxrwxrwx 1 root root 17 Oct 25 14:29 /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.410.73
Okay, so you need to make a symbolic link like the one you found, but where?
I.e., where is Tensorflow looking for this libcuda.so.1? Obviously not where your package manager stuck it.
It turns out that Tensorflow looks in the "load library path".
You can see this path like so:
$ echo $LD_LIBRARY_PATH
and what you get back should include the installed Cuda toolkit:
/usr/local/cuda/lib64
(The exact path might vary on your system)
If not, you need to add the toolkit to $LD_LIBRARY_PATH using some shell command like this (from the NVIDIA Toolkit install manual):
export LD_LIBRARY_PATH=/usr/local/cuda/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
(If you don't find anything in /usr/local/cuda you might not have the toolkit installed.)
Now that you know where Tensorflow looks on the $LD_LIBRARY_PATH for Cuda toolkit, you can add a symbolic link to the toolkit directory.
sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.410.73 /usr/local/cuda/lib64/libcuda.so.1
Or you can just listen to other posts that don't explain what's going on but instead tell you to try installing more things in a bunch of different ways. Didn't work for me though :(

Related

Anaconda Prompt error 'The system cannot find the file specified' and condaHTTPerror

There are many error reports on 'The system cannot find the file specified', but almost all are very old threads with solutions not working now. And only a single query for similar problem for anaconda prompt without any solution.
When I open the anaconda prompt, the error message appears, but the commands work fine, except for the commands when i create new environment. I can not install or update any packages/libraries inside the created environment and prompt gives following error each time the error
conda install keras
Fetching package metadata ...
CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/anaconda------/repodata.json
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
SSLError(SSLError("Can't connect to HTTPS URL because the SSL module is not available.",),)
Solutions I have tried
Uninstalling and reinstalling anaconda3 - both 32 and 64 versions i have tried. I also tried restarting the laptop after uninstalling and befor installing fresh. I followed every step mentioned at https://docs.anaconda.com/anaconda/install/uninstall/
Executing following command from command prompt - [conda config --set ssl_verify no]
Creating the pip.ini file inside pip folder and updating with lines mentioned in this thread.
https://stackoverflow.com/a/52764896/11107306
Browsing through all drivers including display and network drivers for updates and updating them if necessary.
Adding system path variable for anaconda
Downloading and installing Win64OpenSSL application.
Cleaning conda using conda clean --all inside environment and then trying to install again, with n o success.
My system details
OS - Windows 8.1
Platform - win64
Anaconda - 2019.10
Conda version - 4.7.12
Python -3.7.4 (Its work laptop with python 2.7.13 default on command prompt- preinstalled, which I can not remove.)
NVIDIA GTX 960M (updated driver) with Cuda version 9
Please kindly help me. I have wasted almost whole day in this. Or should I just go for other alternative? Kindly suggest good alternative for anaconda, I will be needing machine learning based libraries for my project. Thank you in advance.
This solution worked for me from github.
https://github.com/conda/conda/issues/8273 I have copied the following files libcrypto-1_1-x64.* libssl-1_1-x64.* from D:\Anaconda3\Library\bin to D:\Anaconda3\DLLs.
And this worked very well for condaHTTPerror. Now I can install using conda even within created environment.
However I am still getting the message on prompt 'The system can not find the file specified' each time I open the prompt or carry out the command. How I can resolve this issue? kindly help.
copying the following files
libcrypto-1_1-x64.*
libssl-1_1-x64.*
from bin library to DLLS library worked for me
from C:\users\Alex\Anaconda3\Library\bin to C:\users\Alex\Anaconda3\DLLs.

Is there a way to install Pypy3 on Arago Project?

I am trying to install pypy3 in TI's ARM embedded system.
It was based on linux, so I thought I could install pypy3 like I can do in linux system, but it did not work that way.
Here is what I've done:
unzip the zip file to /opt
made symlink to /usr/local/bin by ln -s opt/pypy3/bin usr/local/bin
I have checked that contents of opt/pypy3/bin are in usr/local/bin.
In each directory, libpypy3-c.so, pypy3, libpypy3-c.so.debug, and pypy3.debug exists
Then when I try pypy main.py, it doesn't work.
It just says -sh: pypy: command not found
They are the ordinary steps of installing pypy in linux.
Is there anyone who has any idea to solve this problem?
Added
When I directly run pypy3 like ./../opt/pypy3/bin/pypy3 main.py, an error message pops up says:
./../opt/pypy3/bin/pypy3: error while loading shared libraries:
libbz2.so.1.0: cannot open shared object file: No such file or
directory
What processor does the TI use? The arm downloads are based on raspberry PI or equivalent. Try the hard float one from here http://www.pypy.org/download.html
It seems you need to add the location of the pypy binary to your $PATH
Read again the sentence here "Linux binaries are only usable on the distributions written next to them". You will need to figure out how to get those dependencies on your distribution, using your OS's package manager. If you work it out, please share your solution so others can reuse it.

error running sphinx due to dyld: Library not loaded: #rpath/Python

I'm trying to use sphinx to build documentation of a package I'm developing. The commands I use used to work. It looks like a link to a library has disappeared on my machine. I'm using a Mac.
> sphinx-autobuild . _build/html
dyld: Library not loaded: #rpath/Python
Referenced from: /Users/XXX/Library/Enthought/Canopy_64bit/User/bin/python
Reason: image not found
where XXX is my user name
Most similar question I can find is pyside-rcc "dyld: Library not loaded:..."
but the answer provided seems to be to copy over a bunch of files from one directory to another, which seems to risk causing other configuration problems.
Other answers relate to issues with
virtualenv (which I am not using) `dyld: Library not loaded` error preventing virtualenv from loading
brew + awscli (again, not being used by me) How to resolve "dyld: Library not loaded: #executable_path.." error
Based on the questions I've seen, it looks like I should fix this by changing the path. Currently
>echo $PATH
Applications/anaconda/bin:/Users/XXX/Library/Enthought/Canopy_64bit/User/bin:/Users/XXX/anaconda/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Library/TeX/texbin:/opt/X11/bin
My .bash_profile is
# added by Anaconda 2.1.0 installer
export PATH="/Users/XXX/anaconda/bin:$PATH"
# Added by Canopy installer on 2016-08-08
# VIRTUAL_ENV_DISABLE_PROMPT can be set to '' to make the bash prompt show that Canopy is active, otherwise 1
alias activate_canopy="source '/Users/XXX/Library/Enthought/Canopy_64bit/User/bin/activate'"
VIRTUAL_ENV_DISABLE_PROMPT=1 source '/Users/XXX/Library/Enthought/Canopy_64bit/User/bin/activate'
# added by Anaconda3 4.3.1 installer
export PATH="/Applications/anaconda/bin:$PATH"
That activate command that canopy is doing looks to be part of the problem.
I fixed this by removing
alias activate_canopy="source '/Users/XXX/Library/Enthought/Canopy_64bit/User/bin/activate'"
VIRTUAL_ENV_DISABLE_PROMPT=1 source '/Users/XXX/Library/Enthought/Canopy_64bit/User/bin/activate'
from my .bash_profile. Still waiting to see if this breaks Canopy.

Configure error while installing graph-tool on ubuntu 14.04

So I spent a whole day trying to find out the solution for this. I am trying to install graph-tool on my machine with 14.04 OS. Initially I was unable to succeed because I didn't have gcc 5 on my machine. After installing it, I am trying the following:
./configure CXX='g++5'
and I get the following error:
===========================
Using python version: 2.7.6
===========================
checking for boostlib >= 1.54.0... configure: We could not detect the boost libraries (version 1.54 or higher). If you have a staged boost library (still not installed) please specify $BOOST_ROOT in your environment and do not give a PATH to --with-boost option. If you are sure you have boost installed, then check your version number looking in <boost/version.hpp>. See http://randspringer.de/boost for more documentation.
checking whether the Boost::Python library is available... no
configure: error: No usable boost::python found
I see no solution on the mailing list of graph-tool or stackoverflow about this problem. I would be really grateful if somebody could help me with this.
Thanks in advance.
In Debian, the libraries are almost always split in two packages: One
containing the shared object and another one with "-dev" suffix which
contains the header files. For cairomm you need to install the
libcairomm-1.0-dev package, in addition to libcairomm-1.0.
And cairo support is optional. If you want to disable it, just pass
the --disable-cairo to the configure script.
Source: https://lists.skewed.de/pipermail/graph-tool/2013-November/001094.html
There are some issues with the boost package on ubuntu 14.04 and some of the graph-tool functions (see graph-tool - k-shortest path - boost::coroutine was not found at compile-time and http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Debian-package-and-boost-at-compile-time-td4026383.html ). At current it seems neccessary to compile boost from source until a newer version of boost is uploaded to the repository in order for graph-tool to work fully.
Once this bug is fixed (https://bugs.launchpad.net/ubuntu/+source/boost1.54/+bug/1529289) it will no longer be a problem.

Problems building node.js on Cygwin, please help

I'm trying to get node.js running on Windows 7. I have no experience with Linux so I've just been blindly following instructions from tutorials I've found, but I'm still unable to build node.js.
What I did:
Install Cygwin - the entirety
Attempt to build node.js
This is the error I first got:
I then followed the commands of two other similar sites and they all resulted in this error (could getting several version of node have caused me more problems? I'm completely clueless on this).
I read somewhere that the Windows version of Python could be causing the problem so I uninstalled my Python 2.7 and added C:\cygwin\bin to the PATH.
That still didn't work and I read somewhere else that I'm supposed to rebaseall so I tried that, but I also got an error for that:
That's where I'm at now. Have any steps I've taken exacerbated the situation?
Add -e '/\/sys-root\/mingw\/bin/d' at line 110 in /bin/rebaseall file.
Then re-run rebaseall -v and you shouldn't get the error anymore.
See this pretty helpful blog posting - Node on CygWin doesn't work for Node v0.2.5. Use the latest v0.4.0 version instead.
Also consider the post's recommendation of compiling against MinGW instead of in CygWin.
First of all, why did you check out such an old release v0.2.5? When I did it a few weeks ago I just took the latest and ended up with 0.5.0pre, but it would also be reasonable to specify v0.4.3. For instance, type git clone git://github.com/joyent/node.git to download node, and then:
cd node
./configure
make install
Secondly, do not rebase by running ash from the CYGWIN shell. Instead, shutdown all Cygwin processes, then use Windows explorer to open the ash.exe binary. Since I have a Windows 7 system without node.js, I decided to follow my instructions and build. Not so easy. I ran into some wierd dll issues that all went away when I ran ./rebaseall followed by ./perlrebase from the ash prompt. It seems that rebaseall is not sufficient anymore.
Thirdly, there is a message that makes it sound like you don't have a C compiler. Some googling will lead you to sites telling which Cygwin packages you need, but at minimum install the g++ compiler and that should pull in C as a dependency.
When I did this I simply ran configure and every time there was an error, installed one more Cygwin package to supply the missing piece. Even OpenSSL is available.
What I just found is remove the windows based install of Python. After uninstalling this, everything is peachy.
I like cygwin a lot -- but recent releases have become pretty unreliable. Some packages just wont build, and some "standard" apps dont work e.g. gvim's "save as" bombs out on my installation.
A possible solution would be run one of the better Linux distributions (ubuntu, fedora, suse etc.) either as a virtual machine or a dual boot setup and do the build inside linux.

Categories

Resources