I had a question regarding tensorflow that is, somewhat critical to what task I'm trying to accomplish.
My scenario is as follows,
1. I have a tensorflow script that has been set-up, trained and tested. It is working well.
The training and testing was done on a devBox with 2 Titan X cards.
We need to now port this system to a live-pilot testing stage and are required to deploy it on a virtual-machine with Ubuntu 14.04 running atop of it.
Here lies the problem - A vm will not have access to underlying GPUs and must validate the incoming data in CPU only mode. My question,
Will the absence of GPUs hinder the validation process of my ML system? Does tensorflow, by default use GPUs for CNN computation and will the absence of a GPU affect the execution?
How do I run my script in CPU only mode?
Will setting CUDA_VISIBLE_DEVICES to none help with the validation in a CPU-only mode after the system has been trained on GPU boxes?
I'm sorry if this comes across as a noob question but I am new to TF and any advice would be much appreciated. Please let me know if you need any further information about my scenario.
Testing with CUDA_VISIBLE_DEVICES set to empty string will make sure that you don't have anything that depends on GPU being present, and theoretically it should be enough. In practice, there are some bugs in GPU codepath which can get triggered when there are no GPUs (like this one), so you want to make sure your GPU software environment (CUDA version) is the same.
Alternatively, you could compile TensorFlow without GPU support (bazel build -c opt tensorflow), this way you don't have to worry about matching CUDA environments or setting CUDA_VISIBLE_DEVICES
Related
I am using machine learning with detecto in Python. However, whenever I run it, I get a warning saying
It looks like you're training your model on a CPU. Consider switching to a GPU; otherwise,
this method can take hours upon hours or even days to finish. For more information, see
https://detecto.readthedocs.io/en/latest/usage/quickstart.html#technical-requirements
I have a GPU in the form of an Intel(R) HD graphics 4600, but for some reason the code is running on the CPU. I have checked out the link it gives which says
By default, Detecto will run all heavy-duty code on the GPU if it’s available and on the CPU otherwise.
It recommends using Google Collab if the computer doesn't have a GPU it can use, but I do have one, and don't want to use Google Collab.
Why is it running on the CPU instead of the GPU? And how can I fix it? The part of my code where I get the warning is
losses = fitmodel(loader, Test_dataset, epochs=25, lr_step_size=5,
learning_rate=0.001, verbose=True)
The code does work, however it takes ages to run, so want to be able to run it on the GPU to save time.
The GPU that detecto is referring to would need to be a CUDA capable Nvidia GPU. So your Intel(R) HD graphics 4600 does not meet this criterion.
Detecto uses pytorch internally, whichs GPU support is based on CUDA. So in order to use a GPU, you would need to move to a machine that has a CUDA capable card
I am currently working on 1D Convolutional Neural Networks for Time Series Classification. Recently, i got CUDA working on my GeForce 3080 (which was a pain itself). However, i noticed a weird behavior when using tensorflow and cuda. After training a model, the gpu memory is not released, even after deleting the variables and doing garbage collection. I tried reseting the tf graph and closing the tf sessions, but the gpu memory stays allocated. This results in cross validation crashing and me having to restart my python environment every time i want to make changes and retrain my model.
After a tideous search, I found out people have been struggling with this 5 years ago. However, I am right now using tf 2.7. I am working on Ubuntu 20.04.3. Some of my colleagues are using windows and are not experiencing these problems. However, it seems like they do not have any issues with models not being able to be retrained because of already allocated memory.
I found the workaround using multiple processes, but wasn't able to get it to work for my model using 10 fold cv.
As the issue has been up for more than 5yrs now and my colleagues not having any problems, I was wondering if I am doing sth. wrong. I think that issue might very likely have been fixed after 5 years, which is why I think my code is the problem here.
Is there any solution / guide for tf 2.7 and memory allocation of the gpu?
I've seen several questions about GPU Memory with Tensorflow but I've installed it on a Pine64 with no GPU support.
That means I'm running it with very limited resources (CPU and RAM only) and Tensorflow seems to want it all, completely freezing my machine.
Is there a way to limit the amount of processing power and memory allocated to Tensorflow? Something similar to bazel's own --local_resources flag?
This will create a session that runs one op at a time, and only one thread per op
sess = tf.Session(config=
tf.ConfigProto(inter_op_parallelism_threads=1,
intra_op_parallelism_threads=1))
Not sure about limiting memory, it seems to be allocated on demand, I've had TensorFlow freeze my machine when my network wanted 100GB of RAM, so my solution was to make networks that need less RAM
For TensorFlow 2.x this has been answered in the following thread:
In Tensorflow 2.x, there is no session anymore. Directly use the config API to set the parallelism at the start of the program.
import tensorflow as tf
tf.config.threading.set_intra_op_parallelism_threads(2)
tf.config.threading.set_inter_op_parallelism_threads(2)
with tf.device('/CPU:0'):
model = tf.keras.models.Sequential([...
https://www.tensorflow.org/api_docs/python/tf/config/threading
I have an Intel Graphics Card (Intel(R) HD Graphics 520, also am on Windows 10) and as far as I know I can't use CUDA unless I have a NVIDIA GPU. The purpose is to use Theano's GPU capabilities (for deep learning which is why I need GPU power).
Is there a workaround that somehow allows me to use CUDA with my current GPU?
If not is there another API that I can use with my current GPU for Theano (in Python 2.7)?
Or as a last option, using another language entirely, such as Java that has an API that allows for GPU use that I can use?
Figuring this out would be very helpful, because even though I just started with deep learning, I will probably get to the point where I need GPU parallel processing power to get results without waiting days at a minimum.
In order:
No. You must have a supported NVIDIA GPU to use CUDA.
As pointed out in comments, there is an alternative backend for Theano which uses OpenCL and which might work on your GPU
Intel support OpenCL on your GPU, so any language bindings for the OpenCL APIs, or libraries with in-built OpenCL would be a possible solution in this case
[This answer has been assembled from comments and added as a community wiki entry in order to get it off the unanswered queue for the CUDA tag].
I have been trying to find ways to enable parallel processing in theano while training a neural network , but I can't seem to find it. Right now when I train a network theano is only using a single core.
Also I do not have access to a GPU , so if I could make theano use all the cores on the machine, then it will hopefull speed things up.
Any tips on speeding up theano is very welcome !
This is what I have been able to figure out.
Follow the instructions on this page
http://deeplearning.net/software/theano/install_ubuntu.html
It seems that I did not install BLAS properly. So I reinstalled everything according to the instructions on the website.
Theano has config flags that have to be set.
And follow the discussion here Why does multiprocessing use only a single core after I import numpy?
Using all this when I run the script
THEANO_FLAGS='openmp=True' OMP_NUM_THREADS=N OPENBLAS_MAIN_FREE=1 python <script>.py
//Where N is the number of cores
Theano uses all the cores on my machine.