I have an extremely weird issue where if I run pytorch model training from Pycharm, it works fine but when I run the same code on the same environment from terminal, it freezes the screen. All windows become non-interactable. The freeze affects only me, not other users and for them >>top shows that the model is no longer training. The issue is consistent and reproducible across machines, users, and GPU slots.
All dependencies are installed to a conda environment dl_segm_auto. In pycharm I have it selected as the interpreter. Parameters are passed through Run->Edit configuration.
From terminal, I run
conda activate dl_segm_auto
python training.py [parameters]
After the first epoch the entire remote session freezes.
Suggestions greatly appreciated!
The issue was caused by the matplotlib's backend taking over the screen on Linux. Any of the following can solve the problem:
Installing PyQt5 (which changes the python's/environment's default backend)
Running from Pycharm (which uses a backend selector on startup).
Having matplotlib.use('Qt5Agg') (and potentially others) at the start of plotting functions or top-level script.
Related
I try to run Tensorflow with GPU support (GTX 1660 SUPER).
I created an enviroment using anaconda, than installed cudatoolkit (version 11.0.221) and tensorflow-gpu (version 2.4.1). Afterwards, I downloaded cuDNN (version 8.0.4), and copied all files from cuDNN's bin folder to my environment's bin folder at anaconda3\envs\<env name>\Library\bin.
In my script, I've set the memory limit to my GPU's memory using tf.config.experimental.set_memory_growth.
When I run the script (which uses convolutional algorithms), I get a warning that says Couldn't invoke ptxas.exe --version which comes after an Call to CreateProcess failed. Error code: 2 error.
After the launch failure, I get: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location.
I've already tried switching to cuDNN version 8.1.1.
How I fix this?
I got a new fix for this.
First I tried using tensorflow=2.3, cudnn=7.6.5 and cudatoolkit=10.1 as mentioned in previous answers. However, every time I put a model to train, the process was going stale and the training seemed to be stuck in epoch 1.
I then managed to include ptxas in my conda environment by running conda install -c nvidia cuda-nvcc The packages I am using are:
tensorflow=2.9, cudnn=8.1.0, cudatoolkit=11.2.2, cuda-nvcc=11.7.99 and python=3.9
I am running everything on windows 10 flawlessly now.
For the benefit of community adding #Zuk Levinson comment
Solves the issue by using
tensorflow=2.3, cudnn=7.6.5 and cudatoolkit=10.1
i've installed anaconda and trained a neural network using (tf) keras on a windows machine. i'm able to use model.predict within the environment (anaconda navigator and visual code), communicating with the outside world via input and output file. however, none of the tools i tried to create an exe work. pyinstaller complains that there is "no module keras", py2exe mentions something is "beyond top level", while cxfreeze refers to "intel mkl threads". any hints other than to visit the astrologer to tell me when and where to buy a new computer?
My goal is to set up my PC for machine and deep learning through my GPU. I've read about all the different components however I can not connect the dots for what I need to do.
OS: Ubuntu 20.04
GPU: Nvidia RTX 2070 Super
Anaconda: 4.8.3
I've installed the nvidia-cuda-toolkit (10.1.243), but now what?
How does this integrate with jupyter notebook?
The 3 python modules I want to work with are:
turicreate - I've gotten this to run off CPU but not GPU
scikit-learn
tensorflow
matlab
I know cuDNN and pyCUDA fit in there somewhere.
Any help is appreciated. Thanks
First of all - I have the experience limited to ubuntu 18.04 and 16.xx and python DL frameworks. But I hope some sugestions will be helpfull.
If I were familiar with docker I would rather consider to use docker instead of setting-up everything from scratch. This approach is described in section about tensorflow container
If you decided to setup all components yourself please see this guideline
I used some contents from it for 18.04, succesfully.
be carefull with automatic updates. After the configuration is finished and tested protect it from being overwritten with newest version of CUDAor TensorRT.
Answering one of your sub-questions - How does this integrate with jupyter notebook? - it does not, becuase it is unneccesary. CUDA library cooperates with a framework such as Tensorflow, not with the Jupyter. Jupyter is just an editor and execution controller on the server side.
When I try to run code using keras and numpy or keras and Matplotlib in one Jupyter notebook I always get the message: The kernel appears to have died. It will restart automatically.
When I run the code in two different notebooks it works perfectly fine. I have installed it using anaconda and I am using macOS. I would really appreciate an answer, everything else I've found and tried so far did not work. Thank you!
When i'm running my tensorflow training module in pycharm IDE in Ubuntu 16.04, it doesn't show any training with GPU and it trains usually with CPU. But When i run the same python script using terminal it runs using GPU training. I want to know how to configure GPU training in Pycharm IDE.
Actually the problem was, the python environment for the pycharm project is not the same as which is in run configurations. This issue was fixed by changing the environment in run configurations.